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Preface to the second and third editions 


Since the publication of the first edition, many students and lectur- 
ers have communicated a number of minor typos and other corrections 
to me. There was also some demand for a hardcover edition of the 
texts. Because of this, the publishers and I have decided to incorporate 
the corrections and issue a hardcover second edition of the textbooks. 
The layout, page numbering, and indexing of the texts have also been 
changed; in particular the two volumes are now numbered and indexed 
separately. However, the chapter and exercise numbering, as well as the 
mathematical content, remains the same as the first edition, and so the 
two editions can be used more or less interchangeably for homework and 
study purposes. 

The third edition contains a number of corrections that were reported 
for the second edition, together with a few new exercises, but is otherwise 
essentially the same text. 



Preface to the first edition 


This text originated from the lecture notes I gave teaching the honours 
undergraduate-level real analysis sequence at the University of Califor- 
nia, Los Angeles, in 2003. Among the undergraduates here, real anal- 
ysis was viewed as being one of the most difficult courses to learn, not 
only because of the abstract concepts being introduced for the first time 
(e.g., topology, limits, measurability, etc.), but also because of the level 
of rigour and proof demanded of the course. Because of this percep- 
tion of difficulty, one was often faced with the difficult choice of either 
reducing the level of rigour in the course in order to make it easier, or 
to maintain strict standards and face the prospect of many undergradu- 
ates, even many of the bright and enthusiastic ones, struggling with the 
course material. 

Faced with this dilemma, I tried a somewhat unusual approach to 
the subject. Typically, an introductory sequence in real analysis assumes 
that the students are already familiar with the real numbers, with math- 
ematical induction, with elementary calculus, and with the basics of set 
theory, and then quickly launches into the heart of the subject, for in- 
stance the concept of a limit. Normally, students entering this sequence 
do indeed have a fair bit of exposure to these prerequisite topics, though 
in most cases the material is not covered in a thorough manner. For in- 
stance, very few students were able to actually define a real number, or 
even an integer, properly, even though they could visualize these num- 
bers intuitively and manipulate them algebraically. This seemed to me 
to be a missed opportunity. Real analysis is one of the first subjects 
(together with linear algebra and abstract algebra) that a student en- 
counters, in which one truly has to grapple with the subtleties of a truly 
rigorous mathematical proof. As such, the course offered an excellent 
chance to go back to the foundations of mathematics, and in particular 
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the opportunity to do a proper and thorough construction of the real 
numbers. 

Thus the course was structured as follows. In the first week, I de- 
scribed some well-known “paradoxes” in analysis, in which standard laws 
of the subject (e.g., interchange of limits and sums, or sums and inte- 
grals) were applied in a non-rigorous way to give nonsensical results such 
as 0 = 1 . This motivated the need to go back to the very beginning of the 
subject, even to the very definition of the natural numbers, and check 
all the foundations from scratch. For instance, one of the first homework 
assignments was to check (using only the Peano axioms) that addition 
was associative for natural numbers (i.e., that (a + h) + c = a + (b + c) 
for all natural numbers a,b,c: see Exercise 2.2.1). Thus even in the 
first week, the students had to write rigorous proofs using mathematical 
induction. After we had derived all the basic properties of the natural 
numbers, we then moved on to the integers (initially defined as formal 
differences of natural numbers); once the students had verified all the 
basic properties of the integers, we moved on to the rationals (initially 
defined as formal quotients of integers); and then from there we moved 
on (via formal limits of Cauchy sequences) to the reals. Around the 
same time, we covered the basics of set theory, for instance demonstrat- 
ing the uncountability of the reals. Only then (after about ten lectures) 
did we begin what one normally considers the heart of undergraduate 
real analysis - limits, continuity, differentiability, and so forth. 

The response to this format was quite interesting. In the first few 
weeks, the students found the material very easy on a conceptual level, 
as we were dealing only with the basic properties of the standard num- 
ber systems. But on an intellectual level it was very challenging, as one 
was analyzing these number systems from a foundational viewpoint, in 
order to rigorously derive the more advanced facts about these number 
systems from the more primitive ones. One student told me how difficult 
it was to explain to his friends in the non-honours real analysis sequence 
(a) why he was still learning how to show why all rational numbers 
are either positive, negative, or zero (Exercise 4.2.4), while the non- 
honours sequence was already distinguishing absolutely convergent and 
conditionally convergent series, and (b) why, despite this, he thought 
his homework was significantly harder than that of his friends. Another 
student commented to me, quite wryly, that while she could obviously 
see why one could always divide a natural number n into a positive 
integer q to give a quotient a and a remainder r less than q (Exercise 
2.3.5), she still had, to her frustration, much difficulty in writing down 
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a proof of this fact. (I told her that later in the course she would have 
to prove statements for which it would not be as obvious to see that 
the statements were true; she did not seem to be particularly consoled 
by this.) Nevertheless, these students greatly enjoyed the homework, as 
when they did perservere and obtain a rigorous proof of an intuitive fact, 
it solidified the link in their minds between the abstract manipulations 
of formal mathematics and their informal intuition of mathematics (and 
of the real world), often in a very satisfying way. By the time they were 
assigned the task of giving the infamous “epsilon and delta” proofs in 
real analysis, they had already had so much experience with formalizing 
intuition, and in discerning the subtleties of mathematical logic (such 
as the distinction between the “for all” quantifier and the “there exists” 
quantifier), that the transition to these proofs was fairly smooth, and we 
were able to cover material both thoroughly and rapidly. By the tenth 
week, we had caught up with the non-honours class, and the students 
were verifying the change of variables formula for Riemann-Stieltjes in- 
tegrals, and showing that piecewise continuous functions were Riemann 
integrable. By the conclusion of the sequence in the twentieth week, we 
had covered (both in lecture and in homework) the convergence theory of 
Taylor and Fourier series, the inverse and implicit function theorem for 
continuously differentiable functions of several variables, and established 
the dominated convergence theorem for the Lebesgue integral. 

In order to cover this much material, many of the key foundational 
results were left to the student to prove as homework; indeed, this was 
an essential aspect of the course, as it ensured the students truly ap- 
preciated the concepts as they were being introduced. This format has 
been retained in this text; the majority of the exercises consist of proving 
lemmas, propositions and theorems in the main text. Indeed, I would 
strongly recommend that one do as many of these exercises as possible 
- and this includes those exercises proving “obvious” statements - if one 
wishes to use this text to learn real analysis; this is not a subject whose 
subtleties are easily appreciated just from passive reading. Most of the 
chapter sections have a number of exercises, which are listed at the end 
of the section. 

To the expert mathematician, the pace of this book may seem some- 
what slow, especially in early chapters, as there is a heavy emphasis 
on rigour (except for those discussions explicitly marked “Informal”), 
and justifying many steps that would ordinarily be quickly passed over 
as being self-evident. The first few chapters develop (in painful detail) 
many of the “obvious” properties of the standard number systems, for 
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instance that the sum of two positive real numbers is again positive (Ex- 
ercise 5.4.1), or that given any two distinct real numbers, one can find 
rational number between them (Exercise 5.4.5). In these foundational 
chapters, there is also an emphasis on non- circularity - not using later, 
more advanced results to prove earlier, more primitive ones. In partic- 
ular, the usual laws of algebra are not used until they are derived (and 
they have to be derived separately for the natural numbers, integers, 
rationals, and reals). The reason for this is that it allows the students 
to learn the art of abstract reasoning, deducing true facts from a lim- 
ited set of assumptions, in the friendly and intuitive setting of number 
systems; the payoff for this practice comes later, when one has to utilize 
the same type of reasoning techniques to grapple with more advanced 
concepts (e.g., the Lebesgue integral). 

The text here evolved from my lecture notes on the subject, and 
thus is very much oriented towards a pedagogical perspective; much 
of the key material is contained inside exercises, and in many cases I 
have chosen to give a lengthy and tedious, but instructive, proof in- 
stead of a slick abstract proof. In more advanced textbooks, the student 
will see shorter and more conceptually coherent treatments of this ma- 
terial, and with more emphasis on intuition than on rigour; however, 
I feel it is important to know how to do analysis rigorously and “by 
hand” first, in order to truly appreciate the more modern, intuitive and 
abstract approach to analysis that one uses at the graduate level and 
beyond. 

The exposition in this book heavily emphasizes rigour and formal- 
ism; however this does not necessarily mean that lectures based on 
this book have to proceed the same way. Indeed, in my own teach- 
ing I have used the lecture time to present the intuition behind the 
concepts (drawing many informal pictures and giving examples), thus 
providing a complementary viewpoint to the formal presentation in the 
text. The exercises assigned as homework provide an essential bridge 
between the two, requiring the student to combine both intuition and 
formal understanding together in order to locate correct proofs for a 
problem. This I found to be the most difficult task for the students, 
as it requires the subject to be genuinely learnt , rather than merely 
memorized or vaguely absorbed. Nevertheless, the feedback I received 
from the students was that the homework, while very demanding for 
this reason, was also very rewarding, as it allowed them to connect the 
rather abstract manipulations of formal mathematics with their innate 
intuition on such basic concepts as numbers, sets, and functions. Of 
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course, the aid of a good teaching assistant is invaluable in achieving this 
connection. 

With regard to examinations for a course based on this text, I would 
recommend either an open-book, open-notes examination with problems 
similar to the exercises given in the text (but perhaps shorter, with no 
unusual trickery involved) , or else a take-home examination that involves 
problems comparable to the more intricate exercises in the text. The 
subject matter is too vast to force the students to memorize the defini- 
tions and theorems, so I would not recommend a closed-book examina- 
tion, or an examination based on regurgitating extracts from the book. 
(Indeed, in my own examinations I gave a supplemental sheet listing the 
key definitions and theorems which were relevant to the examination 
problems.) Making the examinations similar to the homework assigned 
in the course will also help motivate the students to work through and 
understand their homework problems as thoroughly as possible (as op- 
posed to, say, using flash cards or other such devices to memorize mate- 
rial), which is good preparation not only for examinations but for doing 
mathematics in general. 

Some of the material in this textbook is somewhat peripheral to 
the main theme and may be omitted for reasons of time constraints. 
For instance, as set theory is not as fundamental to analysis as are 
the number systems, the chapters on set theory (Chapters 3, 8) can be 
covered more quickly and with substantially less rigour, or be given as 
reading assignments. The appendices on logic and the decimal system 
are intended as optional or supplemental reading and would probably 
not be covered in the main course lectures; the appendix on logic is 
particularly suitable for reading concurrently with the first few chapters. 
Also, Chapter 11.27 (on Fourier series) is not needed elsewhere in the 
text and can be omitted. 

For reasons of length, this textbook has been split into two volumes. 
The first volume is slightly longer, but can be covered in about thirty 
lectures if the peripheral material is omitted or abridged. The second 
volume refers at times to the first, but can also be taught to students 
who have had a first course in analysis from other sources. It also takes 
about thirty lectures to cover. 

I am deeply indebted to my students, who over the progression of 
the real analysis course corrected several errors in the lectures notes 
from which this text is derived, and gave other valuable feedback. I am 
also very grateful to the many anonymous referees who made several 
corrections and suggested many important improvements to the text. 
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Chapter 1 


Introduction 


1.1 What is analysis? 

This text is an honours-level undergraduate introduction to real analy- 
sis: the analysis of the real numbers, sequences and series of real num- 
bers, and real-valued functions. This is related to, but is distinct from, 
complex analysis, which concerns the analysis of the complex numbers 
and complex functions, harmonic analysis, which concerns the analy- 
sis of harmonics (waves) such as sine waves, and how they synthesize 
other functions via the Fourier transform, functional analysis, which fo- 
cuses much more heavily on functions (and how they form things like 
vector spaces), and so forth. Analysis is the rigorous study of such 
objects, with a focus on trying to pin down precisely and accurately 
the qualitative and quantitative behavior of these objects. Real analy- 
sis is the theoretical foundation which underlies calculus, which is the 
collection of computational algorithms which one uses to manipulate 
functions. 

In this text we will be studying many objects which will be familiar 
to you from freshman calculus: numbers, sequences, series, limits, func- 
tions, definite integrals, derivatives, and so forth. You already have a 
great deal of experience of computing with these objects; however here 
we will be focused more on the underlying theory for these objects. We 
will be concerned with questions such as the following: 

1. What is a real number? Is there a largest real number? After 0, 
what is the “next” real number (i.e., what is the smallest positive 
real number)? Can you cut a real number into pieces infinitely 
many times? Why does a number such as 2 have a square root, 
while a number such as -2 does not? If there are infinitely many 
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reals and infinitely many rationals, how come there are “more” 
real numbers than rational numbers? 

2. How do you take the limit of a sequence of real numbers? Which 
sequences have limits and which ones don’t? If you can stop a 
sequence from escaping to infinity, does this mean that it must 
eventually settle down and converge? Can you add infinitely many 
real numbers together and still get a finite real number? Can you 
add infinitely many rational numbers together and end up with a 
non-rational number? If you rearrange the elements of an infinite 
sum, is the sum still the same? 

3. What is a function? What does it mean for a function to be 
continuous? differentiable? integrable? bounded? Can you add 
infinitely many functions together? What about taking limits of 
sequences of functions? Can you differentiate an infinite series of 
functions? What about integrating? If a function f(x) takes the 
value 3 when x = 0 and 5 when x = 1 (i.e. , /( 0) = 3 and /( 1) = 5), 
does it have to take every intermediate value between 3 and 5 when 
x goes between 0 and 1? Why? 

You may already know how to answer some of these questions from 
your calculus classes, but most likely these sorts of issues were only of 
secondary importance to those courses; the emphasis was on getting you 
to perform computations, such as computing the integral of xsin(x 2 ) 
from x = 0 to x = 1. But now that you are comfortable with these 
objects and already know how to do all the computations, we will go 
back to the theory and try to really understand what is going on. 

1.2 Why do analysis? 

It is a fair question to ask, “why bother?”, when it comes to analysis. 
There is a certain philosophical satisfaction in knowing why things work, 
but a pragmatic person may argue that one only needs to know how 
things work to do real-life problems. The calculus training you receive in 
introductory classes is certainly adequate for you to begin solving many 
problems in physics, chemistry, biology, economics, computer science, 
finance, engineering, or whatever else you end up doing - and you can 
certainly use things like the chain rule, L’Hopital’s rule, or integration 
by parts without knowing why these rules work, or whether there are 
any exceptions to these rules. However, one can get into trouble if 
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one applies rules without knowing where they came from and what the 
limits of their applicability are. Let me give some examples in which 
several of these familiar rules, if applied blindly without knowledge of 
the underlying analysis, can lead to disaster. 

Example 1.2.1 (Division by zero). This is a very familiar one to you: 
the cancellation law ac = be =$■ a = b does not work when c = 0. For 
instance, the identity Ix0 = 2x0is true, but if one blindly cancels the 
0 then one obtains 1 = 2, which is false. In this case it was obvious that 
one was dividing by zero; but in other cases it can be more hidden. 


Example 1.2.2 (Divergent series). You have probably seen geometric 
series such as the infinite sum 


„ , 1 1 1 1 
S “ 1 + 2 + 4 + 8 + l6 + --" 


You have probably seen the following trick to sum this series: if we call 
the above sum 5 , then if we multiply both sides by 2, we obtain 


25 = 2 + l + ^ + ^ + ('( + ... = 2 + 5 
2 4 8 

and hence 5 = 2, so the series sums to 2. However, if you apply the 
same trick to the series 


5 = 1 + 2 + 4 + 8 + 16 + ... 

one gets nonsensical results: 

25 = 2 + 4 + 8 + 16 + ... = 5-1 =+> 5 = — 1. 

So the same reasoning that shows that l + ^ + | + ... = 2 also gives 
that 1 + 2 + 4 + 8 + ... = — 1. Why is it that we trust the first equation 
but not the second? A similar example arises with the series 

5=1 — 1 + 1 — 1 + 1 — 1 + ...; 

we can write 

5 = 1- (1 -1 + 1-1 + ...) = 1-5 


and hence that 5 = 1/2; or instead we can write 

5 = (1-1) + (1-1) + (1-1) + ... = 0 + 0 + ... 
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and hence that S = 0; or instead we can write 

5 = 1 + (—1 + 1) + (—1 + 1) + .. . = 1 + 0 + 0 + ... 

and hence that 5 = 1. Which one is correct? (See Exercise 7.2.1 for an 
answer.) 

Example 1.2.3 (Divergent sequences). Here is a slight variation of the 
previous example. Let x be a real number, and let L be the limit 

L = lim x n . 

n— >-oo 

Changing variables n = m + 1 , we have 

L = lim x m+l = lim x X x m = x lim x m . 

m+1— >-oo ra+1— >-oo m+1— >-oo 

But if m + 1 -+ oo, then m -+ oo, thus 

lim x m = lim x m = lim x n = L, 

771+1— 700 777-7 OO 77—700 


and thus 

xL = L. 

At this point we could cancel the L' s and conclude that x = 1 for an 
arbitrary real number x, which is absurd. But since we are already 
aware of the division by zero problem, we could be a little smarter and 
conclude instead that either x = 1, or L = 0. In particular we seem to 
have shown that 

lim x n = 0 for all x / 1. 

n— >oo 

But this conclusion is absurd if we apply it to certain values of x, for 
instance by specializing to the case x = 2 we could conclude that the 
sequence 1, 2,4, 8, . . . converges to zero, and by specializing to the case 
x = — 1 we conclude that the sequence 1,— 1,1,— 1,... also converges to 
zero. These conclusions appear to be absurd; what is the problem with 
the above argument? (See Exercise 6.3.4 for an answer.) 

Example 1.2.4 (Limiting values of functions). Start with the expres- 
sion liin^oo sin(x), make the change of variable x = y + it and recall 
that sin(y + n) = — sin(y) to obtain 

lim sin(x) = lim sin(y + ir) = lim (— sin(y)) = — lim sin(y). 

x — ^oo y+7r— >• oo y — yoo y^-oo 
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Since lim^oo sin(x) = lim^oo sin(y) we thus have 

lim sin(x') = — lim sin(x) 

x—>oo x—>oo 


and hence 


lim sin(x) = 0. 

x—>oo 


If we then make the change of variables x = 7r/2 + z and recall that 
sin(7r/2 + z) = cos(z) we conclude that 


lim cos(.x) = 0. 

#—>•00 


Squaring both of these limits and adding we see that 
lim (sin 2 (x) + cos 2 (x)) = 0 2 + 0 2 = 0. 

x — >oo 

On the other hand, we have sin 2 (re) + cos 2 (x) = 1 for all x. Thus we 
have shown that 1 = 0! What is the difficulty here? 

Example 1.2.5 (Interchanging sums). Consider the following fact of 
arithmetic. Consider any matrix of numbers, e.g. 



and compute the sums of all the rows and the sums of all the columns, 
and then total all the row sums and total all the column sums. In both 
cases you will get the same number - the total sum of all the entries in 
the matrix: 

/ 1 2 3 \ 6 

4 5 6 15 

\ 7 8 9 / 24 

12 15 18 45 


To put it another way, if you want to add all the entries in an m x n 
matrix together, it doesn’t matter whether you sum the rows first or 
sum the columns first, you end up with the same answer. (Before the 
invention of computers, accountants and book-keepers would use this 
fact to guard against making errors when balancing their books.) In 
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series notation, this fact would be expressed as 


= 


*= 1 3 = 1 




j = 1 i=l 


if aij denoted the entry in the row and j th column of the matrix. 

Now one might think that this rule should extend easily to infinite 
series: 

oo oo 

a ij = ^ ^2 a ij ■ 
i= 1 j = 1 j = 1 2=1 

Indeed, if you use infinite series a lot in your work, you will find yourself 
having to switch summations like this fairly often. Another way of saying 
this fact is that in an infinite matrix, the sum of the row-totals should 
equal the sum of the column-totals. However, despite the reasonableness 
of this statement, it is actually false! Here is a counterexample: 

/ 1 0 0 0 ... \ 

-110 0 ... 

0-1 1 0 ... 

0 0-1 1 ... • 

0 0 0 -1 ... 

V ; ; 

If you sum up all the rows, and then add up all the row totals, you get 
1; but if you sum up all the columns, and add up all the column totals, 
you get 0! So, does this mean that summations for infinite series should 
not be swapped, and that any argument using such a swapping should 
be distrusted? (See Theorem 8.2.2 for an answer.) 

Example 1.2.6 (Interchanging integrals). The interchanging of inte- 
grals is a trick which occurs in mathematics just as commonly as the 
interchanging of sums. Suppose one wants to compute the volume un- 
der a surface 2 = f(x,y) (let us ignore the limits of integration for the 
moment). One can do it by slicing parallel to the x-axis: for each fixed 
value of y, we can compute an area f f(x, y) dx, and then we integrate 
the area in the y variable to obtain the volume 



V = 


f(x, y)dxdy. 
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Or we could slice parallel to the y - axis for each fixed x and compute an 
area f f(x,y ) dy, and then integrate in the x-axis to obtain 


V = 


f(x,y)dydx. 


This seems to suggest that one should always be able to swap integral 
signs: 


f(x,y) dxdy 


f(x,y) dydx. 


And indeed, people swap integral signs all the time, because sometimes 
one variable is easier to integrate in first than the other. However, just as 
infinite sums sometimes cannot be swapped, integrals are also sometimes 
dangerous to swap. An example is with the integrand e~ xy — xye~ xy . 
Suppose we believe that we can swap the integrals: 



dy dx 



(e xy — xye xy ) dx dy. (1.1) 


o J o 


Since 

J\e-*v ~ xye- xy ) dy = ye~ xy |*=J = e~*, 
the left-hand side of (1.1) is /“ e -1 dx = — e _x |§° = 1. But since 


{e~ xy - xye~ xy ) dx = xe~ xy \%Z^ = 0, 

the right-hand side of (1.1) is 0 dx = 0. Clearly 1 / 0, so there is an 
error somewhere; but you won’t find one anywhere except in the step 
where we interchanged the integrals. So how do we know when to trust 
the interchange of integrals? (See Theorem 11.50.1 for a partial answer.) 



Example 1.2.7 (Interchanging limits). Suppose we start with the plau- 
sible looking statement 


x 2 x 2 

lim lim — ^ ^ = lim lim — ; „ . (1.2) 

x—>o y—>0 x z + y z y^r OmO x z + y z 


x 2 

lim 

y—^0 x z + y 


2 


■T 2 

x 2 + 0 2 


= 1 , 


But we have 
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so the left-hand side of (1.2) is 1; on the other hand, we have 

x 2 0 2 

lim ^ ^ = 0, 

z->o x 2 + y 2 Cr + y 2 

so the right-hand side of (1.2) is 0. Since 1 is clearly not equal to zero, 
this suggests that interchange of limits is untrustworthy. But are there 
any other circumstances in which the interchange of limits is legitimate? 
(See Exercise 11.9.9 for a partial answer.) 

Example 1.2.8 (Interchanging limits, again). Consider the plausible 
looking statement 


lim lim x n = lim lim x 11 
x -+ 1 - n-yo o n—yoo x -> i - 


where the notation x — >• 1“ means that x is approaching 1 from the 
left. When x is to the left of 1, then lim n ^. 0O a: n = 0, and hence the 
left-hand side is zero. But we also have lim a ._ ) . 1 - x n = 1 for all n, and so 
the right-hand side limit is 1. Does this demonstrate that this type of 
limit interchange is always untrustworthy? (See Proposition 11.15.3 for 
an answer.) 


Example 1.2.9 (Interchanging limits and integrals). For any real num- 
ber y, we have 


1 7 r 

J_ x i + (x - y y ix = aictan(l “ = 2 


Taking limits as y — >• oo, we should obtain 



= 7 r. 



lim 

y— >oo 1 + 



y ) 2 


dx 


lim 

y—> oo 



1 

1 + (x — y) 2 


dx = n. 


But for every x, we have lirn^oo = 0. So we seem to have 

concluded that 0 = n. What was the problem with the above argument? 
Should one abandon the (very useful) technique of interchanging limits 
and integrals? (See Theorem 11.18.1 for a partial answer.) 

Example 1.2.10 (Interchanging limits and derivatives). Observe that 
if e > 0, then 


d f x 3 \ 3x 2 (e 2 + x 2 ) — 2x 4 

dx \e 2 + x 2 ) (e 2 + x 2 ) 2 
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and in particular that 

d f x 3 \ 
dx \£ 2 + x 2 ) x ~° 

Taking limits as e — >• 0, one might then expect that 

d ( x 3 \ 
dx \0 + x 2 ) x ~° 


But the right-hand side is -^-x = 1. Does this mean that it is always 
illegitimate to interchange limits and derivatives? (See Theorem 11.19.1 
for an answer.) 

Example 1.2.11 (Interchanging derivatives). Let 1 f(x,y) be the func- 

3 

tion f(x, y ) := if? 2 • A common maneuvre in analysis is to interchange 

x -\-y 

two partial derivatives, thus one expects 


d 2 f 

dxdy 


(0,0) 


d 2 f 

dydx 


( 0 , 0 ). 


But from the quotient rule we have 

df_ ( . = 3 xy 2 2 xy A 

dy X, y x 2 -\-y 2 (x 2 + y 2 ) 2 


and in particular 


Thus 


§h.,0) = V"=°- 

ay x z x * 


d 2 f 

dxdy 


( 0 , 0 ) = 0 . 


On the other hand, from the quotient rule again we have 

df y 3 2 x 2 y 3 

■ \ o 1 2 7 2 1 2\2 

dx x A + y z (x z + y z ) z 


and hence 





1 One might object that this function is not defined at ( x , y) = (0, 0), but if we set 

/( 0, 0) := (0, 0) then this function becomes continuous and differentiable for all ( x , y), 

and in fact both partial derivatives ^ are also continuous and differentiable for 
all (x,y)\ 
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Thus 


d 2 f 

dydx 


( 0 , 0 ) = 1 . 


Since 1 / 0, we thus seem to have shown that interchange of deriva- 
tives is untrustworthy. But are there any other circumstances in which 
the interchange of derivatives is legitimate? (See Theorem 11.37.4 and 
Exercise 11.37.1 for some answers.) 


Example 1.2.12 (L’Hopital’s rule). We are all familiar with the beau- 
tifully simple L’Hopital’s rule 


lim 

X^XQ 


f(x) 

g(x) 


r f(x) 

lim ~7TT’ 
x^xo g [x) 


but one can still get led to incorrect conclusions if one applies it incor- 
rectly. For instance, applying it to /(x) := x, g(x) : = 1 + x, and Xo : = 0 
we would obtain , 

lim = lim - = 1, 

x — >0 1 -)- X x — >0 1 

but this is the incorrect answer, since liin^o = iqyj = 0- Of course, 
all that is going on here is that L’Hopital’s rule is only applicable when 
both /(x) and g{x) go to zero as x — >• xq, a condition which was violated 
in the above example. But even when /(x) and g(x) do go to zero 
as x — >• xo there is still a possibility for an incorrect conclusion. For 
instance, consider the limit 


.. x 2 sm(x 4 ) 

lim 

a;->0 X 


Both numerator and denominator go to zero as x — >• 0, so it seems pretty 
safe to apply L’Hopital’s rule, to obtain 


x 2 sm(x 4 ) 2xsm(x 4 ) — 4x 3 cos(x 4 ) 

lim = Inn 

x->-0 X a;— >0 1 

= lim 2xsin(x -4 ) — lim 4x -3 cos(x -4 ). 

a;— >0 a:^0 


The first limit converges to zero by the squeeze test (since the function 
2xsin(x~ 4 ) is bounded above by 2|x| and below by — 2|x|, both of which 
go to zero at 0). But the second limit is divergent (because x~ 3 goes 
to infinity as x — >• 0, and cos(x -4 ) does not go to zero). So the limit 

v 2# sin(ai — 4 )— 4# — 2 cos(:r — 4 ) i- i < ,i i i 

linx^o 1 - — i 1 - diverges. One might then conclude using 

L’Hopital’s rule that liin x ^o - sm ^' r ^ also diverges; however we can 
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clearly rewrite this limit as lim^^o x sin(a; -4 ), which goes to zero when 
x — >• 0 by the squeeze test again. This does not show that L’Hopital’s 
rule is untrustworthy (indeed, it is quite rigorous; see Section 10.5), but 
it still requires some care when applied. 

Example 1.2.13 (Limits and lengths). When you learn about integra- 
tion and how it relates to the area under a curve, you were probably 
presented with some picture in which the area under the curve was ap- 
proximated by a bunch of rectangles, whose area was given by a Riemann 
sum, and then one somehow “took limits” to replace that Riemann sum 
with an integral, which then presumably matched the actual area under 
the curve. Perhaps a little later, you learnt how to compute the length 
of a curve by a similar method - approximate the curve by a bunch of 
line segments, compute the length of all the line segments, then take 
limits again to see what you get. 

However, it should come as no surprise by now that this approach 
also can lead to nonsense if used incorrectly. Consider the right-angled 
triangle with vertices (0,0), (1,0), and (0,1), and suppose we wanted 
to compute the length of the hypotenuse of this triangle. Pythagoras’ 
theorem tells us that this hypotenuse has length \/2, but suppose for 
some reason that we did not know about Pythagoras’ theorem, and 
wanted to compute the length using calculus methods. Well, one way 
to do so is to approximate the hypotenuse by horizontal and vertical 
edges. Pick a large number N, and approximate the hypotenuse by a 
“staircase” consisting of N horizontal edges of equal length, alternating 
with N vertical edges of equal length. Clearly these edges all have length 
1/N, so the total length of the staircase is 2N/N = 2. If one takes limits 
as N goes to infinity, the staircase clearly approaches the hypotenuse, 
and so in the limit we should get the length of the hypotenuse. However, 
as N — >• oo, the limit of 2N/N is 2, not \/2. so we have an incorrect value 
for the length of the hypotenuse. How did this happen? 

The analysis you learn in this text will help you resolve these ques- 
tions, and will let you know when these rules (and others) are justified, 
and when they are illegal, thus separating the useful applications of these 
rules from the nonsense. Thus they can prevent you from making mis- 
takes, and can help you place these rules in a wider context. Moreover, 
as you learn analysis you will develop an “analytical way of thinking”, 
which will help you whenever you come into contact with any new rules 
of mathematics, or when dealing with situations which are not quite 
covered by the standard rules, For instance, what if your functions are 
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complex-valued instead of real-valued? What if you are working on the 
sphere instead of the plane? What if your functions are not continuous, 
but are instead things like square waves and delta functions? What if 
your functions, or limits of integration, or limits of summation, are occa- 
sionally infinite? You will develop a sense of why a rule in mathematics 
(e.g., the chain rule) works, how to adapt it to new situations, and what 
its limitations (if any) are; this will allow you to apply the mathematics 
you have already learnt more confidently and correctly. 



Chapter 2 


Starting at the beginning: the natural numbers 


In this text, we will review the material you have learnt in high school 
and in elementary calculus classes, but as rigorously as possible. To do 
so we will have to begin at the very basics - indeed, we will go back to the 
concept of numbers and what their properties are. Of course, you have 
dealt with numbers for over ten years and you know how to manipulate 
the rules of algebra to simplify any expression involving numbers, but 
we will now turn to a more fundamental issue, which is: why do the rules 
of algebra work at all? For instance, why is it true that a(b + c ) is equal 
to ab + ac for any three numbers a, b, cl This is not an arbitrary choice 
of rule; it can be proven from more primitive, and more fundamental, 
properties of the number system. This will teach you a new skill - how 
to prove complicated properties from simpler ones. You will find that 
even though a statement may be “obvious” , it may not be easy to prove; 
the material here will give you plenty of practice in doing so, and in the 
process will lead you to think about why an obvious statement really is 
obvious. One skill in particular that you will pick up here is the use of 
mathematical induction , which is a basic tool in proving things in many 
areas of mathematics. 

So in the first few chapters we will re-acquaint you with various 
number systems that are used in real analysis. In increasing order of 
sophistication, they are the natural numbers N; the integers Z; the ra- 
tional Q, and the real numbers R. (There are other number systems 
such as the complex numbers C, but we will not study them until Sec- 
tion 11.26.) The natural numbers {0, 1, 2, . . .} are the most primitive of 
the number systems, but they are used to build the integers, which in 
turn are used to build the rationals. Furthermore, the rationals are used 
to build the real numbers, which are in turn used to build the complex 
numbers. Thus to begin at the very beginning, we must look at the 
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natural numbers. We will consider the following question: how does one 
actually define the natural numbers? (This is a very different question 
from how to use the natural numbers, which is something you of course 
know how to do very well. It’s like the difference between knowing how 
to use, say, a computer, versus knowing how to build that computer.) 

This question is more difficult to answer than it looks. The basic 
problem is that you have used the natural numbers for so long that 
they are embedded deeply into your mathematical thinking, and you 
can make various implicit assumptions about these numbers (e.g., that 
a + b is always equal to b + a) without even aware that you are doing 
so; it is difficult to let go and try to inspect this number system as if it 
is the first time you have seen it. So in what follows I will have to ask 
you to perform a rather difficult task: try to set aside, for the moment, 
everything you know about the natural numbers; forget that you know 
how to count, to add, to multiply, to manipulate the rules of algebra, 
etc. We will try to introduce these concepts one at a time and identify 
explicitly what our assumptions are as we go along - and not allow our- 
selves to use more “advanced” tricks such as the rules of algebra until we 
have actually proven them. This may seem like an irritating constraint, 
especially as we will spend a lot of time proving statements which are 
“obvious”, but it is necessary to do this suspension of known facts to 
avoid circularity (e.g., using an advanced fact to prove a more elemen- 
tary fact, and then later using the elementary fact to prove the advanced 
fact). Also, this exercise will be an excellent way to affirm the founda- 
tions of your mathematical knowledge. Furthermore, practicing your 
proofs and abstract thinking here will be invaluable when we move on 
to more advanced concepts, such as real numbers, functions, sequences 
and series, differentials and integrals, and so forth. In short, the results 
here may seem trivial, but the journey is much more important than 
the destination, for now. (Once the number systems are constructed 
properly, we can resume using the laws of algebra etc. without having 
to rederive them each time.) 

We will also forget that we know the decimal system, which of course 
is an extremely convenient way to manipulate numbers, but it is not 
something which is fundamental to what numbers are. (For instance, 
one could use an octal or binary system instead of the decimal system, 
or even the Roman numeral system, and still get exactly the same set 
of numbers.) Besides, if one tries to fully explain what the decimal 
number system is, it isn’t as natural as you might think. Why is 00423 
the same number as 423, but 32400 isn’t the same number as 324? Why 
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is 123.4444 ... a real number, while . . . 444.321 is not? And why do we 
have to carry of digits when adding or multiplying? Why is 0.999 . . . the 
same number as 1? What is the smallest positive real number? Isn’t 
it just 0.00 . . . 001? So to set aside these problems, we will not try to 
assume any knowledge of the decimal system, though we will of course 
still refer to numbers by their familiar names such as 1,2,3, etc. instead 
of using other notation such as I, II, III or 0++, (O-H-)-H-, ((0++)-H-)-H- 
(see below) so as not to be needlessly artificial. For completeness, we 
review the decimal system in an Appendix (§B). 

2.1 The Peano axioms 

We now present one standard way to define the natural numbers, in 
terms of the Peano axioms, which were first laid out by Guiseppe Peano 
(1858-1932). This is not the only way to define the natural numbers. 
For instance, another approach is to talk about the cardinality of finite 
sets, for instance one could take a set of five elements and define 5 to be 
the number of elements in that set. We shall discuss this alternate ap- 
proach in Section 3.6. However, we shall stick with the Peano axiomatic 
approach for now. 

How are we to define what the natural numbers are? Informally, we 
could say 

Definition 2.1.1. (Informal) A natural number is any element of the 
set 

N := {0, 1,2, 3, 4, . . .}, 

which is the set of all the numbers created by starting with 0 and then 
counting forward indefinitely. We call N the set of natural numbers. 

Remark 2.1.2. In some texts the natural numbers start at 1 instead of 
0, but this is a matter of notational convention more than anything else. 
In this text we shall refer to the set {1, 2, 3, . . .} as the positive integers 
Z + rather than the natural numbers. Natural numbers are sometimes 
also known as whole numbers. 

In a sense, this definition solves the problem of what the natural 
numbers are: a natural number is any element of the set 1 N. However, 

1 Strictly speaking, there is another problem with this informal definition: we have 
not yet defined what a “set” is, or what “element of” is. Thus for the rest of this 
chapter we shall avoid mention of sets and their elements as much as possible, except 
in informal discussion. 
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it is not really that satisfactory, because it begs the question of what 
N is. This definition of “start at 0 and count indefinitely” seems like 
an intuitive enough definition of N, but it is not entirely acceptable, 
because it leaves many questions unanswered. For instance: how do 
we know we can keep counting indefinitely, without cycling back to 0? 
Also, how do you perform operations such as addition, multiplication, 
or exponentiation? 

We can answer the latter question first: we can define complicated 
operations in terms of simpler operations. Exponentiation is nothing 
more than repeated multiplication: 5 3 is nothing more than three fives 
multiplied together. Multiplication is nothing more than repeated addi- 
tion; 5 x 3 is nothing more than three fives added together. (Subtraction 
and division will not be covered here, because they are not operations 
which are well-suited to the natural numbers; they will have to wait for 
the integers and rationals, respectively.) And addition? It is nothing 
more than the repeated operation of counting forward, or incrementing. 
If you add three to five, what you are doing is incrementing five three 
times. On the other hand, incrementing seems to be a fundamental op- 
eration, not reducible to any simpler operation; indeed, it is the first 
operation one learns on numbers, even before learning to add. 

Thus, to define the natural numbers, we will use two fundamental 
concepts: the zero number 0, and the increment operation. In deference 
to modern computer languages, we will use n- H- to denote the increment 
or successor of n, thus for instance 3++ = 4, (3-H-)-H- = 5, etc. This 
is a slightly different usage from that in computer languages such as C, 
where n-H- actually redefines the value of n to be its successor; however 
in mathematics we try not to define a variable more than once in any 
given setting, as it can often lead to confusion; many of the statements 
which were true for the old value of the variable can now become false, 
and vice versa. 

So, it seems like we want to say that N consists of 0 and everything 
which can be obtained from 0 by incrementing: N should consist of the 
objects 

0, 0++, (0-| — b) H — F, ( (0-| — I - ) H — I - ) H — F, etc. 

If we start writing down what this means about the natural numbers, 
we thus see that we should have the following axioms concerning 0 and 
the increment operation -H-: 

Axiom 2.1. 0 is a natural number. 
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Axiom 2.2. If n is a natural number, then n-H- is also a natural num- 
ber. 


Thus for instance, from Axiom 2.1 and two applications of Axiom 2.2, 
we see that (0++)++ is a natural number. Of course, this notation will 
begin to get unwieldy, so we adopt a convention to write these numbers 
in more familiar notation: 

Definition 2.1.3. We define 1 to be the number 0++, 2 to be the 
number (0++)++, 3 to be the number ((0-H-)-H-)-H-, etc. (In other 
words, 1 := 0-H-, 2 := 1++, 3 := 2++, etc. In this text I use “x : = y” 
to denote the statement that x is defined to equal y.) 

Thus for instance, we have 

Proposition 2.1.4. 3 is a natural number. 

Proof. By Axiom 2.1, 0 is a natural number. By Axiom 2.2, 0-H- = 1 is 
a natural number. By Axiom 2.2 again, 1++ = 2 is a natural number. 
By Axiom 2.2 again, 2++ = 3 is a natural number. □ 

It may seem that this is enough to describe the natural numbers. 
However, we have not pinned down completely the behavior of N: 

Example 2.1.5. Consider a number system which consists of the num- 
bers 0, 1,2,3, in which the increment operation wraps back from 3 to 
0. More precisely 0++ is equal to 1, 1++ is equal to 2, 2+4- is equal 
to 3, but 3++ is equal to 0 (and also equal to 4, by definition of 4). 
This type of thing actually happens in real life, when one uses a com- 
puter to try to store a natural number: if one starts at 0 and performs 
the increment operation repeatedly, eventually the computer will over- 
flow its memory and the number will wrap around back to 0 (though 
this may take quite a large number of incrementation operations, for 
instance a two-byte representation of an integer will wrap around only 
after 65,536 increments). Note that this type of number system obeys 
Axiom 2.1 and Axiom 2.2, even though it clearly does not correspond 
to what we intuitively believe the natural numbers to be like. 

To prevent this sort of “wrap-around issue” we will impose another 
axiom: 


Axiom 2.3. 0 is not the successor of any natural number; i.e., we have 
n+f / 0 for every natural number n. 
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Now we can show that certain types of wrap-around do not occur: 
for instance we can now rule out the type of behavior in Example 2.1.5 
using 

Proposition 2.1.6. 4 is not equal to 0. 

Don’t laugh! Because of the way we have defined 4 - it is the in- 
crement of the increment of the increment of the increment of 0 - it is 
not necessarily true a priori that this number is not the same as zero, 
even if it is “obvious” . ( “a priori” is Latin for “beforehand” - it refers to 
what one already knows or assumes to be true before one begins a proof 
or argument. The opposite is “a posteriori” - what one knows to be 
true after the proof or argument is concluded.) Note for instance that 
in Example 2.1.5, 4 was indeed equal to 0, and that in a standard two- 
byte computer representation of a natural number, for instance, 65536 
is equal to 0 (using our definition of 65536 as equal to 0 incremented 
sixty- five thousand, five hundred and thirty-six times). 

Proof. By definition, 4 = 3++. By Axioms 2.1 and 2.2, 3 is a natural 
number. Thus by Axiom 2.3, 3++ / 0, i.e., 4/0. □ 

However, even with our new axiom, it is still possible that our num- 
ber system behaves in other pathological ways: 

Example 2.1.7. Consider a number system consisting of five numbers 

0. 1. 2. 3. 4, in which the increment operation hits a “ceiling” at 4. More 
precisely, suppose that 0++ = 1, 1-H- = 2, 2++ = 3, 3++ = 4, but 
4-H- = 4 (or in other words that 5 = 4, and hence 6 = 4, 7 = 4, etc.). 
This does not contradict Axioms 2. 1,2. 2, 2. 3. Another number system 
with a similar problem is one in which incrementation wraps around, 
but not to zero, e.g. suppose that 4-H- = 1 (so that 5 = 1, then 6 = 2, 
etc.). 

There are many ways to prohibit the above types of behavior from 
happening, but one of the simplest is to assume the following axiom: 

Axiom 2.4. Different natural numbers must have different successors; 

1. e., if n, m are natural numbers and n / m, then n++ / m++. Equiv- 
alently 2 , if n- H- = m- H-, then we must have n = m. 

2 This is an example of reformulating an implication using its contrapositive ; see 
Section A. 2 for more details. In the converse direction, if n = m, then n++ = m-H-; 
this is the axiom of substitution (see Section A. 7) applied to the operation ++. 
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Thus, for instance, we have 

Proposition 2.1.8. 6 is not equal to 2. 

Proof. Suppose for sake of contradiction that 6 = 2. Then 5++ = 1++, 
so by Axiom 2.4 we have 5 = 1, so that 4++ = 0-H-. By Axiom 2.4 again 
we then have 4 = 0, which contradicts our previous proposition. □ 

As one can see from this proposition, it now looks like we can keep all 
of the natural numbers distinct from each other. There is however still 
one more problem: while the axioms (particularly Axioms 2.1 and 2.2) 
allow us to confirm that 0, 1,2,3,... are distinct elements of N, there is 
the problem that there may be other “rogue” elements in our number 
system which are not of this form: 

Example 2.1.9. (Informal) Suppose that our number system N con- 
sisted of the following collection of integers and half-integers: 

N := {0,0.5, 1,1.5, 2, 2.5, 3, 3.5,...}. 

(This example is marked “informal” since we are using real numbers, 
which we’re not supposed to use yet.) One can check that Axioms 2.1- 
2.4 are still satisfied for this set. 

What we want is some axiom which says that the only numbers in N 
are those which can be obtained from 0 and the increment operation - 
in order to exclude elements such as 0.5. But it is difficult to quantify 
what we mean by “can be obtained from” without already using the 
natural numbers, which we are trying to define. Fortunately, there is an 
ingenious solution to try to capture this fact: 

Axiom 2.5 (Principle of mathematical induction). Let P{n ) he any 
property pertaining to a natural number n. Suppose that P( 0) is true, 
and suppose that whenever P(n) is true, P(n-H-) is also true. Then 
P(n) is true for every natural number n. 

Remark 2.1.10. We are a little vague on what “property” means at 
this point, but some possible examples of P(n ) might be “n is even”; 
“n is equal to 3”; “re solves the equation (n + l) 2 = n 2 + 2 n + 1”; and 
so forth. Of course we haven’t defined many of these concepts yet, but 
when we do, Axiom 2.5 will apply to these properties. (A logical remark: 
Because this axiom refers not just to variables, but also properties, it is 
of a different nature than the other four axioms; indeed, Axiom 2.5 
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should technically be called an axiom schema rather than an axiom - it 
is a template for producing an (infinite) number of axioms, rather than 
being a single axiom in its own right. To discuss this distinction further 
is far beyond the scope of this text, though, and falls in the realm of 
logic.) 

The informal intuition behind this axiom is the following. Suppose 
P(n) is such that P(0) is true, and such that whenever P(n ) is true, 
then P(n- H-) is true. Then since P( 0) is true, P(O-H-) = P(l) is true. 
Since P( 1) is true, P(l-H-) = P( 2) is true. Repeating this indefinitely, 
we see that P(0), P( 1), P{ 2), P( 3), etc. are all true - however this 
line of reasoning will never let us conclude that P(0.5), for instance, is 
true. Thus Axiom 2.5 should not hold for number systems which contain 
“unnecessary” elements such as 0.5. (Indeed, one can give a “proof” of 
this fact. Apply Axiom 2.5 to the property P(n) = n “is not a half- 
integer”, i.e. , an integer plus 0.5. Then P( 0) is true, and if P(n) is true, 
then P(n-H-) is true. Thus Axiom 2.5 asserts that P(n ) is true for all 
natural numbers n, i.e., no natural number can be a half-integer. In 
particular, 0.5 cannot be a natural number. This “proof” is not quite 
genuine, because we have not defined such notions as “integer”, “half- 
integer”, and “0.5” yet, but it should give you some idea as to how the 
principle of induction is supposed to prohibit any numbers other than 
the “true” natural numbers from appearing in N.) 

The principle of induction gives us a way to prove that a property 
P(n) is true for every natural number n. Thus in the rest of this text 
we will see many proofs which have a form like this: 

Proposition 2.1.11. A certain property P(n ) is true for every natural 
number n. 

Proof. We use induction. We first verify the base case n = 0, i.e., we 
prove P( 0). (Insert proof of P(0) here). Now suppose inductively that n 
is a natural number, and P(n) has already been proven. We now prove 
P(n++). (Insert proof of P(n++), assuming that P(n) is true, here). 
This closes the induction, and thus P(n) is true for all numbers n. □ 

Of course we will not necessarily use the exact template, wording, 
or order in the above type of proof, but the proofs using induction will 
generally be something like the above form. There are also some other 
variants of induction which we shall encounter later, such as backwards 
induction (Exercise 2.2.6), strong induction (Proposition 2.2.14), and 
transfinite induction (Lemma 8.5.15). 
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Axioms 2. 1-2.5 are known as the Peano axioms for the natural num- 
bers. They are all very plausible, and so we shall make 

Assumption 2.6. ( Informal ) There exists a number system N, whose 
elements we will call natural numbers, for which Axioms 2. 1-2.5 are 
true. 

We will make this assumption a bit more precise once we have laid 
down our notation for sets and functions in the next chapter. 

Remark 2.1.12. We will refer to this number system N as the natural 
number system. One could of course consider the possibility that there 
is more than one natural number system, e.g., we could have the Hindu- 
Arabic number system {0, 1, 2, 3, . . .} and the Roman number system 
{O, I , II , III , IV. , V, VI, . . .}, and if we really wanted to be annoying we 
could view these number systems as different. But these number systems 
are clearly equivalent (the technical term is isomorphic ), because one 
can create a one-to-one correspondence 0 O, 1 «->• I, 2 «->• II, etc. 
which maps the zero of the Hindu- Arabic system with the zero of the 
Roman system, and which is preserved by the increment operation (e.g., 
if 2 corresponds to II, then 2~H~ will correspond to 11++). For a more 
precise statement of this type of equivalence, see Exercise 3.5.13. Since 
all versions of the natural number system are equivalent, there is no 
point in having distinct natural number systems, and we will just use a 
single natural number system to do mathematics. 

We will not prove Assumption 2.6 (though we will eventually include 
it in our axioms for set theory, see Axiom 3.7), and it will be the only 
assumption we will ever make about our numbers. A remarkable ac- 
complishment of modern analysis is that just by starting from these five 
very primitive axioms, and some additional axioms from set theory, we 
can build all the other number systems, create functions, and do all the 
algebra and calculus that we are used to. 

Remark 2.1.13. (Informal) One interesting feature about the natural 
numbers is that while each individual natural number is finite, the set of 
natural numbers is infinite; i.e., N is infinite but consists of individually 
finite elements. (The whole is greater than any of its parts.) There 
are no infinite natural numbers; one can even prove this using Axiom 
2.5, provided one is comfortable with the notions of finite and infinite. 
(Clearly 0 is finite. Also, if n is finite, then clearly n++ is also finite. 
Hence by Axiom 2.5, all natural numbers are finite.) So the natural 
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numbers can approach infinity, but never actually reach it; infinity is 
not one of the natural numbers. (There are other number systems which 
admit “infinite” numbers, such as the cardinals, ordinals, and p-adics, 
but they do not obey the principle of induction, and in any event are 
beyond the scope of this text.) 

Remark 2.1.14. Note that our definition of the natural numbers is ax- 
iomatic rather than constructive. We have not told you what the natural 
numbers are (so we do not address such questions as what the numbers 
are made of, are they physical objects, what do they measure, etc.) - 
we have only listed some things you can do with them (in fact, the only 
operation we have defined on them right now is the increment one) and 
some of the properties that they have. This is how mathematics works 
- it treats its objects abstractly , caring only about what properties the 
objects have, not what the objects are or what they mean. If one wants 
to do mathematics, it does not matter whether a natural number means 
a certain arrangement of beads on an abacus, or a certain organization 
of bits in a computer’s memory, or some more abstract concept with no 
physical substance; as long as you can increment them, see if two of them 
are equal, and later on do other arithmetic operations such as add and 
multiply, they qualify as numbers for mathematical purposes (provided 
they obey the requisite axioms, of course). It is possible to construct 
the natural numbers from other mathematical objects - from sets, for 
instance - but there are multiple ways to construct a working model of 
the natural numbers, and it is pointless, at least from a mathematician’s 
standpoint, as to argue about which model is the “true” one - as long as 
it obeys all the axioms and does all the right things, that’s good enough 
to do maths. 

Remark 2.1.15. Historically, the realization that numbers could be 
treated axiomatically is very recent, not much more than a hundred 
years old. Before then, numbers were generally understood to be in- 
extricably connected to some external concept, such as counting the 
cardinality of a set, measuring the length of a line segment, or the mass 
of a physical object, etc. This worked reasonably well, until one was 
forced to move from one number system to another; for instance, under- 
standing numbers in terms of counting beads, for instance, is great for 
conceptualizing the numbers 3 and 5, but doesn’t work so well for —3 
or 1/3 or \/2 or 3 + 4i; thus each great advance in the theory of num- 
bers - negative numbers, irrational numbers, complex numbers, even 
the number zero - led to a lot of unnecessary philosophical anguish. 
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The great discovery of the late nineteenth century was that numbers 
can be understood abstractly via axioms, without necessarily needing a 
concrete model; of course a mathematician can use any of these models 
when it is convenient, to aid his or her intuition and understanding, but 
they can also be just as easily discarded when they begin to get in the 
way. 

One consequence of the axioms is that we can now define sequences 
recursively. Suppose we want to build a sequence ao, ai, 02, . . . of num- 
bers by first defining a 0 to be some base value, e.g., ao := c for some 
number c, and then by letting a\ be some function of ao, ai := fo(ao), 
a2 be some function of ai, 02 := /i(ai), and so forth. In general, we 
set a n . |_|_ := / n (a n ) for some function f n from N to N. By using all 
the axioms together we will now conclude that this procedure will give 
a single value to the sequence element a n for each natural number n. 
More precisely 3 : 

Proposition 2.1.16 (Recursive definitions). Suppose for each natural 
number n, we have some function f n : N — >• N from the natural numbers 
to the natural numbers. Let c be a natural number. Then we can assign 
a unique natural number a n to each natural number n, such that oq = c 
and a n++ = / n (a n ) for each natural number n. 

Proof. (Informal) We use induction. We first observe that this proce- 
dure gives a single value to a 0, namely c. (None of the other defini- 
tions a n - 1_|_ := f n (a n ) will redefine the value of ao, because of Axiom 
2 . 3 .) Now suppose inductively that the procedure gives a single value 
to a n . Then it gives a single value to a n . h_, namely a n++ := f n (a n ). 
(None of the other definitions a m++ := f m (a m ) will redefine the value 
of a n _|_|_, because of Axiom 2 . 4 .) This completes the induction, and so 
a n is defined for each natural number n, with a single value assigned to 
each a n . □ 

Note how all of the axioms had to be used here. In a system which 
had some sort of wrap-around, recursive definitions would not work 


3 Strictly speaking, this proposition requires one to define the notion of a function, 
which we shall do in the next chapter. However, this will not be circular, as the 
concept of a function does not require the Peano axioms. Proposition 2.1.16 can be 
formalized more rigorously in the language of set theory; see Exercise 3.5.12. 
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because some elements of the sequence would constantly be redefined. 
For instance, in Example 2.1.5, in which 3++ = 0, then there would 
be (at least) two conflicting definitions for ao, either c or 73 ( 03 )). In 
a system which had superfluous elements such as 0.5, the element ao .5 
would never be defined. 

Recursive definitions are very powerful; for instance, we can use them 
to define addition and multiplication, to which we now turn. 


2.2 Addition 

The natural number system is very bare right now: we have only one 
operation - increment - and a handful of axioms. But now we can build 
up more complex operations, such as addition. 

The way it works is the following. To add three to five should be the 
same as incrementing five three times - this is one increment more than 
adding two to five, which is one increment more than adding one to five, 
which is one increment more than adding zero to five, which should just 
give five. So we give a recursive definition for addition as follows. 

Definition 2.2.1 (Addition of natural numbers). Let m be a natural 
number. To add zero to m, we define 0 + m := m. Now suppose 
inductively that we have defined how to add n to m. Then we can add 
nTf to m by defining (n4T) + m := (n + m)++. 

Thus 0 + m is m, 1 + m = (0-H-) + m is m++; 2 + m = (14T) + m = 
(m-H-)TF; and so forth; for instance we have 2 + 3 = (3++)+T = 
4++ = 5. From our discussion of recursion in the previous section 
we see that we have defined n + m for every natural number n. Here 
we are specializing the previous general discussion to the setting where 
a n = n+m and / n (a n ) = a n - H-. Note that this definition is asymmetric: 
3 + 5 is incrementing 5 three times, while 5 + 3 is incrementing 3 five 
times. Of course, they both yield the same value of 8. More generally, it 
is a fact (which we shall prove shortly) that a + b = b + a for all natural 
numbers a, b , although this is not immediately clear from the definition. 

Notice that we can prove easily, using Axioms 2.1, 2.2, and induction 
(Axiom 2.5), that the sum of two natural numbers is again a natural 
number (why?). 

Right now we only have two facts about addition: that 0 + m = m, 
and that (n-H-) + m = [n + m)4T. Remarkably, this turns out to be 
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enough to deduce everything else we know about addition. We begin 
with some basic lemmas 4 . 

Lemma 2.2.2. For any natural number n, n + 0 = n. 

Note that we cannot deduce this immediately from 0 + m = m be- 
cause we do not know yet that a + b = b + a. 

Proof. We use induction. The base case 0 + 0 = 0 follows since we 
know that 0 + m = m for every natural number m, and 0 is a natural 
number. Now suppose inductively that n + 0 = n. We wish to show 
that (n++) + 0 = n++. But by definition of addition, (n++) + 0 is equal 
to (n + 0)++, which is equal to n++ since n + 0 = n. This closes the 
induction. □ 

Lemma 2.2.3. For any natural numbers n and m, n + (m++) = [n + 
m)++. 

Again, we cannot deduce this yet from (n++) + m = (n + m)- H- 
because we do not know yet that a + b = b + a. 

Proof. We induct on n (keeping m fixed). We first consider the base 
case n = 0. In this case we have to prove 0 + (m++) = (0 + m)++. 
But by definition of addition, 0 + (m-H-) = m++ and 0 + m = m, so 
both sides are equal to m-H- and are thus equal to each other. Now 
we assume inductively that n + (m-H-) = (n + m)++; we now have to 
show that (n++) + (m+f) = ((n++) + m)++. The left-hand side is (n + 
(m-H-))+f by definition of addition, which is equal to ((n + m)++)-H- 
by the inductive hypothesis. Similarly, we have (n++) + m = (n + m)+f 
by the definition of addition, and so the right-hand side is also equal to 
((n + m)-H-)++. Thus both sides are equal to each other, and we have 
closed the induction. □ 


4 From a logical point of view, there is no difference between a lemma, proposition, 
theorem, or corollary - they are all claims waiting to be proved. However, we use 
these terms to suggest different levels of importance and difficulty. A lemma is an 
easily proved claim which is helpful for proving other propositions and theorems, but 
is usually not particularly interesting in its own right. A proposition is a statement 
which is interesting in its own right, while a theorem is a more important statement 
than a proposition which says something definitive on the subject, and often takes 
more effort to prove than a proposition or lemma. A corollary is a quick consequence 
of a proposition or theorem that was proven recently. 
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As a particular corollary of Lemma 2.2.2 and Lemma 2.2.3 we see 
that n++ = n + 1 (why? ) . 

As promised earlier, we can now prove that a + b = b + a. 

Proposition 2.2.4 (Addition is commutative). For any natural num- 
bers n and m, n + m = m + n. 

Proof. We shall use induction on n (keeping m fixed). First we do the 
base case n = 0, i.e., we show 0 + m = m + 0. By the definition of 
addition, 0 + m = m, while by Lemma 2.2.2, m + 0 = m. Thus the 
base case is done. Now suppose inductively that n + m = m + n, now 
we have to prove that (n++) + m = m + (n++) to close the induction. 
By the definition of addition, (n++) + rn = (n + m)++. By Lemma 
2.2.3, m + (n++) = (m + n)++, but this is equal to (n + m)++ by the 
inductive hypothesis n + m = m + n. Thus (n++) + m = m + (n+f ) 
and we have closed the induction. □ 

Proposition 2.2.5 (Addition is associative). For any natural numbers 
a , b , c, we have (a + b) + c = a + (b + c) . 

Proof. See Exercise 2.2.1. □ 

Because of this associativity we can write sums such as a + b + c 
without having to worry about which order the numbers are being added 
together. 

Now we develop a cancellation law. 

Proposition 2.2.6 (Cancellation law). Let a,b,c be natural numbers 
such that a + b = a + c. Then we have b = c. 

Note that we cannot use subtraction or negative numbers yet to prove 
this proposition, because we have not developed these concepts yet. In 
fact, this cancellation law is crucial in letting us define subtraction (and 
the integers) later on in this text, because it allows for a sort of “virtual 
subtraction” even before subtraction is officially defined. 

Proof. We prove this by induction on a. First consider the base case 
a = 0. Then we have 0 + b = 0 + c, which by definition of addition 
implies that b = c as desired. Now suppose inductively that we have the 
cancellation law for a (so that a + b = a + c implies b = c ) ; we now have 
to prove the cancellation law for a++. In other words, we assume that 
(a++) + b = (a++) + c and need to show that b = c. By the definition 
of addition, (o++) + b = (a + 6)++ and (a++) + c = (a + c)+f and so 
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we have (a + 6)++ = (a + c)-H-. By Axiom 2.4, we have a + b = a + c. 
Since we already have the cancellation law for a, we thus have b = c as 
desired. This closes the induction. □ 

We now discuss how addition interacts with positivity. 

Definition 2.2.7 (Positive natural numbers). A natural number n is 
said to be positive iff it is not equal to 0. (“iff” is shorthand for “if and 
only if” - see Section A.l). 

Proposition 2.2.8. If a is positive and b is a natural number, then a+b 
is positive ( and hence b + a is also, by Proposition 2.2. If). 

Proof. We use induction on b. If b = 0, then a + 6 = a + 0 = a, which 
is positive, so this proves the base case. Now suppose inductively that 
a + b is positive. Then a + (6++) = (a + 6)++, which cannot be zero by 
Axiom 2.3, and is hence positive. This closes the induction. □ 

Corollary 2.2.9. If a and b are natural numbers such that a + b = 0, 
then a = 0 and 6 = 0. 

Proof. Suppose for sake of contradiction that a / 0 or 6 / 0. If a / 0 
then a is positive, and hence a + 6 = 0 is positive by Proposition 2.2.8, a 
contradiction. Similarly if 6 / 0 then 6 is positive, and again a + 6 = 0 is 
positive by Proposition 2.2.8, a contradiction. Thus a and 6 must both 
be zero. □ 

Lemma 2.2.10. Let a be a positive number. Then there exists exactly 
one natural number b such that b++ = a. 

Proof. See Exercise 2.2.2. □ 

Once we have a notion of addition, we can begin defining a notion 
of order. 

Definition 2.2.11 (Ordering of the natural numbers). Let n and m be 
natural numbers. We say that n is greater than or equal to m , and write 
n > m. or m < n, iff we have n = m + a for some natural number a. 
We say that n is strictly greater than m, and write n > m or m < n, iff 
n > m and n / m. 

Thus for instance 8 > 5, because 8 = 5 + 3 and 8/5. Also note that 
n++ > n for any n; thus there is no largest natural number n, because 
the next number n++ is always larger still. 
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Proposition 2.2.12 (Basic properties of order for natural numbers). 
Let a, b, c be natural numbers. Then 

(a) ( Order is reflexive ) a > a. 

( b ) ( Order is transitive ) If a > b and b > c, then a> c. 

(c) ( Order is anti- symmetric) If a > b and b> a, then a = b. 

( d ) ( Addition preserves order ) a > b if and only if a + c > b + c. 

(e) a < b if and only if a +4- < b. 

(/) a < b if and only if b = a + d for some positive number d. 

Proof. See Exercise 2.2.3. □ 

Proposition 2.2.13 (Trichotomy of order for natural numbers). Let a 
and b be natural numbers. Then exactly one of the following statements 
is true: a < b, a = b, or a > b. 

Proof. This is only a sketch of the proof; the gaps will be filled in Exer- 
cise 2.2.4. 

First we show that we cannot have more than one of the statements 
a<b,a = b,a>b holding at the same time. If a < b then a b by 
definition, and if a > b then a b by definition. If a > b and a < b then 
by Proposition 2.2.12 we have a = b, a contradiction. Thus no more 
than one of the statements is true. 

Now we show that at least one of the statements is true. We keep b 
fixed and induct on a. When a = 0 we have 0 < b for all b (why?), so 
we have either 0 = b or 0 < b, which proves the base case. Now suppose 
we have proven the proposition for a, and now we prove the proposition 
for a++. From the trichotomy for a, there are three cases: a < b, a = b, 
and a > b. If a > b, then a- H- > b (why?). If a = b, then a++ > b 
(why?). Now suppose that a < b. Then by Proposition 2.2.12, we have 
a-H- < b. Thus either a++ = b or aT+ < b, and in either case we are 
done. This closes the induction. □ 

The properties of order allow one to obtain a stronger version of the 
principle of induction: 

Proposition 2.2.14 (Strong principle of induction). Let mo be a natu- 
ral number, and let P(m) be a property pertaining to an arbitrary natural 
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number m. Suppose that for each m > mo, we have the following im- 
plication: if P(m!) is true for all natural numbers mo < m' < m, then 
P(m) is also true. (In particular, this means that P(mo) is true, since 
in this case the hypothesis is vacuous.) Then we can conclude that P(m) 
is true for all natural numbers m > mo- 

Remark 2.2.15. In applications we usually use this principle with mo = 
0 or mo = 1. 

Proof. See Exercise 2.2.5. □ 


— Exercises — 

Exercise 2.2.1. Prove Proposition 2.2.5. (Hint: fix two of the variables and 
induct on the third.) 

Exercise 2.2.2. Prove Lemma 2.2.10. (Hint: use induction.) 

Exercise 2.2.3. Prove Proposition 2.2.12. (Hint: you will need many of the 
preceding propositions, corollaries, and lemmas.) 

Exercise 2.2.4. Justify the three statements marked (why?) in the proof of 
Proposition 2.2.13. 

Exercise 2.2.5. Prove Proposition 2.2.14. (Hint: define Q{n) to be the property 
that P(m) is true for all too < to < n; note that Q{n ) is vacuously true when 
n < mo.) 

Exercise 2.2.6. Let n be a natural number, and let P(m) be a property per- 
taining to the natural numbers such that whenever P(m-H-) is true, then P(to) 
is true. Suppose that P(n) is also true. Prove that P(m) is true for all natural 
numbers m < n; this is known as the principle of backwards induction. (Hint: 
apply induction to the variable n.) 

2.3 Multiplication 

In the previous section we have proven all the basic facts that we know to 
be true about addition and order. To save space and to avoid belaboring 
the obvious, we will now allow ourselves to use all the rules of algebra 
concerning addition and order that we are familiar with, without further 
comment. Thus for instance we may write things like a + 6 + c = c + 
b + a without supplying any further justification. Now we introduce 
multiplication. Just as addition is the iterated increment operation, 
multiplication is iterated addition: 

Definition 2.3.1 (Multiplication of natural numbers). Let m be a nat- 
ural number. To multiply zero to m, we define 0 x m := 0. Now suppose 
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inductively that we have defined how to multiply n to m. Then we can 
multiply n++ to m by defining (n++) x m := (n x m) + m. 

Thus for instance 0 x m = 0, 1 x m = 0 + m, 2 x m = 0 + m + m, 
etc. By induction one can easily verify that the product of two natural 
numbers is a natural number. 

Lemma 2.3.2 (Multiplication is commutative). Let n,m be natural 
numbers. Then n x m = m x n. 

Proof. See Exercise 2 . 3 . 1 . □ 

We will now abbreviate n x m as nm , and use the usual convention 
that multiplication takes precedence over addition, thus for instance 
ab + c means (a x b) + c, not ax (6 + c). (We will also use the usual 
notational conventions of precedence for the other arithmetic operations 
when they are defined later, to save on using parentheses all the time.) 

Lemma 2.3.3 (Positive natural numbers have no zero divisors). Let 
n, m be natural numbers. Then n x m = 0 if and only if at least one of 
n, m is equal to zero. In particular, if n and m are both positive, then 
nm. is also positive. 

Proof. See Exercise 2 . 3 . 2 . □ 

Proposition 2.3.4 (Distributive law). For any natural numbers a,b,c, 
we have a(b + c) = ab + ac and (b + c)a = ba + ca. 

Proof. Since multiplication is commutative we only need to show the first 
identity a(b + c) = ab + ac. We keep a and b fixed, and use induction 
on c. Let’s prove the base case c = 0, i.e., a(b + 0) = ab + a0. The 
left-hand side is ab, while the right-hand side is ab + 0 = ab, so we are 
done with the base case. Now let us suppose inductively that a(b + c ) = 
ab + ac, and let us prove that a(b + (c-H-)) = ab + a(c++). The left- 
hand side is a((b + c)-H-) = a(b + c) + a, while the right-hand side is 
ab + ac + a = a(b + c) + a by the induction hypothesis, and so we can 
close the induction. □ 

Proposition 2.3.5 (Multiplication is associative). For any natural 
numbers a, b, c, we have (a x b) x c = a x (b x c). 


Proof. See Exercise 2.3.3. 


□ 
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Proposition 2.3.6 (Multiplication preserves order). Ifa,b are natural 
numbers such that a < b, and c is positive, then ac < be. 

Proof. Since a < b, we have b = a + d for some positive d. Multiplying 
by c and using the distributive law we obtain be = ac + dc. Since 
d is positive, and c is positive, dc is positive, and hence ac < be as 
desired. □ 

Corollary 2.3.7 (Cancellation law). Leta,b,c be natural numbers such 
that ac = be and c is non-zero. Then a = b. 

Remark 2.3.8. Just as Proposition 2.2.6 will allow for a “virtual sub- 
traction” which will eventually let us define genuine subtraction, this 
corollary provides a “virtual division” which will be needed to define 
genuine division later on. 

Proof. By the trichotomy of order (Proposition 2.2.13), we have three 
cases: a < b, a = b, a > b. Suppose first that a < b, then by Propo- 
sition 2.3.6 we have ac < be, a contradiction. We can obtain a similar 
contradiction when a > b. Thus the only possibility is that a = b, as 
desired. □ 

With these propositions it is easy to deduce all the familiar rules of 
algebra involving addition and multiplication, see for instance Exercise 
2.3.4. 

Now that we have the familiar operations of addition and multipli- 
cation, the more primitive notion of increment will begin to fall by the 
wayside, and we will see it rarely from now on. In any event we can 
always use addition to describe incrementation, since n-H- = n + 1. 

Proposition 2.3.9 (Euclidean algorithm). Let n be a natural number, 
and let q be a positive number. Then there exist natural numbers m, r 
such that 0 < r < q and n = mq + r. 

Remark 2.3.10. In other words, we can divide a natural number n by 
a positive number q to obtain a quotient m (which is another natural 
number) and a remainder r (which is less than q). This algorithm marks 
the beginning of number theory , which is a beautiful and important 
subject but one which is beyond the scope of this text. 


Proof. See Exercise 2.3.5. 


□ 
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2. Starting at the beginning: the natural numbers 


Just like one uses the increment operation to recursively define ad- 
dition, and addition to recursively define multiplication, one can use 
multiplication to recursively define exponentiation: 

Definition 2.3.11 (Exponentiation for natural numbers). Let m be 
a natural number. To raise m to the power 0, we define m° := 1; in 
particular, we define 0° := 1. Now suppose recursively that m n has been 
defined for some natural number n, then we define := m2 x m. 

Examples 2.3.12. Thus for instance x l = x° x x = 1 x x = x\ x 2 = 
x 1 x x = x x x; x s = x 2 x x = x x x x x; and so forth. By induction we 
see that this recursive definition defines x n for all natural numbers n. 

We will not develop the theory of exponentiation too deeply here, 
but instead wait until after we have defined the integers and rational 
numbers; see in particular Proposition 4.3.10. 

— Exercises — 

Exercise 2.3.1. Prove Lemma 2.3.2. (Hint: modify the proofs of Lemmas 2.2.2, 
2.2.3 and Proposition 2.2.4.) 

Exercise 2.3.2. Prove Lemma 2.3.3. (Hint: prove the second statement first.) 

Exercise 2.3. 3. Prove Proposition 2.3.5. (Hint: modify the proof of Proposition 
2.2.5 and use the distributive law.) 

Exercise 2.3.4. Prove the identity (a + b) 2 = a 2 + 2ab + b 2 for all natural 
numbers a , b. 

Exercise 2.3.5. Prove Proposition 2.3.9. (Hint: fix q and induct on n.) 



Chapter 3 


Set theory 


Modern analysis, like most of modern mathematics, is concerned with 
numbers, sets, and geometry. We have already introduced one type 
of number system, the natural numbers. We will introduce the other 
number systems shortly, but for now we pause to introduce the concepts 
and notation of set theory, as they will be used increasingly heavily in 
later chapters. (We will not pursue a rigorous description of Euclidean 
geometry in this text, preferring instead to describe that geometry in 
terms of the real number system by means of the Cartesian co-ordinate 
system. ) 

While set theory is not the main focus of this text, almost every other 
branch of mathematics relies on set theory as part of its foundation, so 
it is important to get at least some grounding in set theory before doing 
other advanced areas of mathematics. In this chapter we present the 
more elementary aspects of axiomatic set theory, leaving more advanced 
topics such as a discussion of infinite sets and the axiom of choice to 
Chapter 8. A full treatment of the finer subtleties of set theory (of 
which there are many!) is unfortunately well beyond the scope of this 
text. 


3.1 Fundamentals 

In this section we shall set out some axioms for sets, just as we did for 
the natural numbers. For pedagogical reasons, we will use a somewhat 
overconrplete list of axioms for set theory, in the sense that some of the 
axioms can be used to deduce others, but there is no real harm in doing 
this. We begin with an informal description of what sets should be. 
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3. Set theory 


Definition 3.1.1. (Informal) We define a set A to be any unordered 
collection of objects, e.g., {3,8, 5,2} is a set. If x is an object, we say 
that x is an element of A or x € A if x lies in the collection; otherwise 
we say that x 0 A. For instance, 3 € {1, 2, 3, 4, 5} but 7 0 {1, 2, 3, 4, 5}. 

This definition is intuitive enough, but it doesn’t answer a number 
of questions, such as which collections of objects are considered to be 
sets, which sets are equal to other sets, and how one defines operations 
on sets (e.g., unions, intersections, etc.). Also, we have no axioms yet 
on what sets do, or what their elements do. Obtaining these axioms and 
defining these operations will be the purpose of the remainder of this 
section. 

We first clarify one point: we consider sets themselves to be a type 
of object. 

Axiom 3.1 (Sets are objects). If A is a set, then A is also an object. 
In particular, given two sets A and B, it is meaningful to ask whether 
A is also an element of B. 

Example 3.1.2. (Informal) The set {3, {3, 4}, 4} is a set of three distinct 
elements, one of which happens to itself be a set of two elements. See 
Example 3.1.10 for a more formal version of this example. However, not 
all objects are sets; for instance, we typically do not consider a natural 
number such as 3 to be a set. (The more accurate statement is that 
natural numbers can be the cardinalities of sets, rather than necessarily 
being sets themselves. See Section 3.6.) 

Remark 3.1.3. There is a special case of set theory, called “pure 
set theory”, in which all objects are sets; for instance the number 0 
might be identified with the empty set 0 = {}, the number 1 might 
be identified with {0} = {{}}, the number 2 might be identified with 
{0, 1} = {{}, {{}}}, and so forth. From a logical point of view, pure set 
theory is a simpler theory, since one only has to deal with sets and not 
with objects; however, from a conceptual point of view it is often easier 
to deal with impure set theories in which some objects are not consid- 
ered to be sets. The two types of theories are more or less equivalent 
for the purposes of doing mathematics, and so we shall take an agnostic 
position as to whether all objects are sets or not. 

To summarize so far, among all the objects studied in mathematics, 
some of the objects happen to be sets; and if x is an object and A is a 
set, then either x € A is true or x € A is false. (If A is not a set, we leave 
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the statement x £ A undefined; for instance, we consider the statement 
3 £ 4 to neither be true or false, but simply meaningless, since 4 is not 
a set.) 

Next, we define the notion of equality: when are two sets considered 
to be equal? We do not consider the order of the elements inside a set 
to be important; thus we think of {3, 8, 5, 2} and {2, 3, 5, 8} as the same 
set. On the other hand, {3, 8, 5, 2} and {3, 8, 5, 2,1} are different sets, 
because the latter set contains an element that the former one does not, 
namely the element 1. For similar reasons {3, 8, 5, 2} and {3,8,5} are 
different sets. We formalize this as a definition: 

Definition 3.1.4 (Equality of sets). Two sets A and B are equal , A = B, 
iff every element of A is an element of B and vice versa. To put it another 
way, A = B if and only if every element x of A belongs also to B, and 
every element y of B belongs also to A. 

Example 3.1.5. Thus, for instance, {1,2, 3, 4, 5} and {3,4, 2, 1,5} are 
the same set, since they contain exactly the same elements. (The set 
{3, 3, 1, 5, 2, 4, 2} is also equal to {1, 2, 3, 4, 5}; the repetition of 3 and 2 
is irrelevant as it does not further change the status of 2 and 3 being 
elements of the set.) 

One can easily verify that this notion of equality is reflexive, symmet- 
ric, and transitive (Exercise 3.1.1). Observe that if x £ A and A = B, 
then x £ B, by Definition 3.1.4. Thus the “is an element of” relation £ 
obeys the axiom of substitution (see Section A. 7). Because of this, any 
new operation we define on sets will also obey the axiom of substitution, 
as long as we can define that operation purely in terms of the relation 
G. This is for instance the case for the remaining definitions in this 
section. (On the other hand, we cannot use the notion of the “first” or 
“last” element in a set in a well-defined manner, because this would not 
respect the axiom of substitution; for instance the sets {1, 2, 3, 4, 5} and 
{3, 4, 2, 1, 5} are the same set, but have different first elements.) 

Next, we turn to the issue of exactly which objects are sets and 
which objects are not. The situation is analogous to how we defined 
the natural numbers in the previous chapter; we started with a single 
natural number, 0, and started building more numbers out of 0 using 
the increment operation. We will try something similar here, starting 
with a single set, the empty set , and building more sets out of the empty 
set by various operations. We begin by postulating the existence of the 
empty set. 
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Axiom 3.2 (Empty set). There exists a set 0, known as the empty set, 
which contains no elements, i.e., for every object x we have x fL 0. 

The empty set is also denoted {}. Note that there can only be one 
empty set; if there were two sets 0 and 0' which were both empty, then 
by Definition 3.1.4 they would be equal to each other (why?). 

If a set is not equal to the empty set, we call it non-empty. The 
following statement is very simple, but worth stating nevertheless: 

Lemma 3.1.6 (Single choice). Let A be a non-empty set. Then there 
exists an object x such that x € A. 

Proof. We prove by contradiction. Suppose there does not exist any 
object x such that x € A. Then for all objects x, we have x 0 A. Also, 
by Axiom 3.2 we have x 0 0. Thus x € A x € 0 (both statements 
are equally false), and so A = 0 by Definition 3.1.4, a contradiction. □ 

Remark 3.1.7. The above Lemma asserts that given any non-empty set 
A, we are allowed to “choose” an element x of A which demonstrates this 
non-emptyness. Later on (in Lemma 3.5.12) we will show that given any 
finite number of non-empty sets, say A i, . . . , A n , it is possible to choose 
one element x±, ... ,x n from each set A\, ... , A n ; this is known as “finite 
choice”. However, in order to choose elements from an infinite number 
of sets, we need an additional axiom, the axiom of choice , which we will 
discuss in Section 8.4. 

Remark 3.1.8. Note that the empty set is not the same thing as the 
natural number 0. One is a set; the other is a number. However, it is 
true that the cardinality of the empty set is 0; see Section 3.6. 

If Axiom 3.2 was the only axiom that set theory had, then set theory 
could be quite boring, as there might be just a single set in existence, 
the empty set. We now present further axioms to enrich the class of sets 
available. 

Axiom 3.3 (Singleton sets and pair sets). If a is an object, then there 
exists a set {a} whose only element is a, i.e., for every object y, we have 
y € {a} if and only if y = a; we refer to {a} as the singleton set whose 
element is a. Furthermore, if a and b are objects, then there exists a set 
{a, 6} whose only elements are a and b; i.e., for every object y, we have 
y € {a, b} if and only if y = a or y = b; we refer to this set as the pair 
set formed by a and b. 
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Remarks 3.1.9. Just as there is only one empty set, there is only 
one singleton set for each object a, thanks to Definition 3.1.4 (why?). 
Similarly, given any two objects a and b , there is only one pair set formed 
by a and b. Also, Definition 3.1.4 also ensures that {a, b} = {b,a} 
(why?) and {a, a} = {a} (why?). Thus the singleton set axiom is in fact 
redundant, being a consequence of the pair set axiom. Conversely, the 
pair set axiom will follow from the singleton set axiom and the pairwise 
union axiom below (see Lemma 3.1.13). One may wonder why we don’t 
go further and create triplet axioms, quadruplet axioms, etc.; however 
there will be no need for this once we introduce the pairwise union axiom 
below. 

Examples 3.1.10. Since 0 is a set (and hence an object), so is singleton 
set {0}, i.e., the set whose only element is 0, is a set (and it is not the 
same set as 0 , { 0 } / 0 (why?)). Similarly, the singleton set {{ 0 }} and 
the pair set { 0 , { 0 }} are also sets. These three sets are not equal to each 
other (Exercise 3.1.2). 

As the above examples show, we can now create quite a few sets; 
however, the sets we make are still fairly small (each set that we can 
build consists of no more than two elements, so far). The next axiom 
allows us to build somewhat larger sets than before. 

Axiom 3.4 (Pairwise union). Given any two sets A, B, there exists a 
set A U B, called the union A U B of A and B, whose elements consists 
of all the elements which belong to A or B or both. In other words, for 
any object x, 

x&AuB (x&Aorx&B). 

Recall that “or” refers by default in mathematics to inclusive or: U X 
or Y is true” means that “either X is true, or Y is true, or both are 
true”. See Section A. 1. 

Example 3.1.11. The set {1, 2}U{2, 3} consists of those elements which 
either lie on {1, 2} or in {2, 3} or in both, or in other words the elements 
of this set are simply 1, 2, and 3. Because of this, we denote this set as 
{1, 2} U {2, 3} = {1,2,3}. 

Remark 3.1.12. If A, B, A' are sets, and A is equal to A 1 , then Au B 
is equal to A' Li B (why? One needs to use Axiom 3.4 and Definition 
3.1.4). Similarly if B' is a set which is equal to B, then A U B is equal 
to ALi B' . Thus the operation of union obeys the axiom of substitution, 
and is thus well-defined on sets. 
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We now give some basic properties of unions. 

Lemma 3.1.13. If a and b are objects, then {a,b} = {a} U {6}. If 
A,B,C are sets, then the union operation is commutative (i.e., AuB = 
BuA) and associative (i.e., (Au B)U C = AU (BU C)) . Also, we have 
iUi = ,4U0 = 0Ul = J 4. 

Proof. We prove just the associativity identity (. A\JB)\JC = A\J(BUC), 
and leave the remaining claims to Exercise 3.1.3. By Definition 3.1.4, 
we need to show that every element x of (A U B) U C is an element of 
A U (B L) C), and vice versa. So suppose first that x is an element of 
(A U B) U C. By Axiom 3.4, this means that at least one of x € A U B or 
x £ C is true. We now divide into two cases. If x G C, then by Axiom 3.4 
again iSBuC, and so by Axiom 3.4 again we have i€iU(BllC). 
Now suppose instead x £ A Li B, then by Axiom 3.4 again x G A or 
x € B. If x € A then x € A U (B U C) by Axiom 3.4, while if x G B 
then by consecutive applications of Axiom 3.4 we have x € B U C and 
hence x € Au(BUC). Thus in all cases we see that every element of 
(A U B) U C lies in A U (B U C). A similar argument shows that every 
element of AD(BL)C) lies in (AuB)LiC , and so (AL)B)L)C = AD(BDC) 
as desired. □ 

Because of the above lemma, we do not need to use parentheses 
to denote multiple unions, thus for instance we can write A L) B L) C 
instead of (iUB)UC or A U (B U C). Similarly for unions of four sets, 
AL) BUCU D, etc. 

Remark 3.1.14. While the operation of union has some similarities 
with addition, the two operations are not identical. For instance, {2} U 
{3} = {2, 3} and 2 + 3 = 5, whereas {2} + {3} is meaningless (addition 
pertains to numbers, not sets) and 2 U 3 is also meaningless (union 
pertains to sets, not numbers). 

This axiom allows us to define triplet sets, quadruplet sets, and so 
forth: if a, b, c are three objects, we define {a, 6, c} := {a} U {6} U {c}; if 
a, b , c, d are four objects, then we define {a, b , c, d} := {a}U{6}U{c}U{ci}, 
and so forth. On the other hand, we are not yet in a position to define 
sets consisting of n objects for any given natural number n; this would 
require iterating the above construction “n times”, but the concept of 
n-fold iteration has not yet been rigorously defined. For similar reasons, 
we cannot yet define sets consisting of infinitely many objects, because 
that would require iterating the axiom of pairwise union infinitely often, 
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and it is not clear at this stage that one can do this rigorously. Later on, 
we will introduce other axioms of set theory which allow one to construct 
arbitrarily large, and even infinite, sets. 

Clearly, some sets seem to be larger than others. One way to for- 
malize this concept is through the notion of a subset. 

Definition 3.1.15 (Subsets). Let A,B be sets. We say that d is a 
subset of B, denoted A C B, iff every element of A is also an element of 

B, i.e. 

For any object x, x £ A =^> x £ B. 

We say that A is a proper subset of B, denoted A C B, if A C B and 
A / B. 

Remark 3.1.16. Because these definitions involve only the notions of 
equality and the “is an element of” relation, both of which already obey 
the axiom of substitution, the notion of subset also automatically obeys 
the axiom of substitution. Thus for instance if A C B and A = A' , then 
A 1 C B. 

Examples 3.1.17. We have {1, 2, 4} C {1, 2, 3, 4, 5}, because every ele- 
ment of {1, 2, 4} is also an element of {1, 2, 3, 4, 5}. In fact we also have 
{1, 2, 4} C {1, 2, 3, 4, 5}, since the two sets {1, 2, 4} and {1, 2, 3, 4, 5} are 
not equal. Given any set A. we always have A C A (why?) and 0 C A 
(why?). 

The notion of subset in set theory is similar to the notion of “less 
than or equal to” for numbers, as the following Proposition demonstrates 
(for a more precise statement, see Definition 8.5.1): 

Proposition 3.1.18 (Sets are partially ordered by set inclusion). Let 
A,B,C be sets. If A C B and B C C then ACC. If A C B and 
B C A, then A = B. Finally, if A C B and B C C then A C C . 

Proof. We shall just prove the first claim. Suppose that A C B and 
B C C. To prove that A C C, we have to prove that every element of A 
is an element of C. So, let us pick an arbitrary element x of A. Then, 
since A C B, x must then be an element of B. But then since B C C , 
x is an element of C . Thus every element of A is indeed an element of 

C, as claimed. □ 

Remark 3.1.19. There is a relationship between subsets and unions: 
see for instance Exercise 3.1.7. 



40 


3. Set theory 


Remark 3.1.20. There is one important difference between the subset 
relation C and the less than relation <. Given any two distinct natural 
numbers n, m, we know that one of them is smaller than the other 
(Proposition 2.2.13); however, given two distinct sets, it is not in general 
true that one of them is a subset of the other. For instance, take A : = 
{2?r : n € N} to be the set of even natural numbers, and B := {2n + 1 : 
n G N} to be the set of odd natural numbers. Then neither set is 
a subset of the other. This is why we say that sets are only partially 
ordered, whereas the natural numbers are totally ordered (see Definitions 
8.5.1, 8.5.3). 


Remark 3.1.21. We should also caution that the subset relation C is 
not the same as the element relation G. The number 2 is an element of 
{1, 2, 3} but not a subset; thus 2 G {1, 2, 3}, but 2 <2 {1, 2, 3}. Indeed, 2 
is not even a set. Conversely, while {2} is a subset of {1,2,3}, it is not 
an element; thus {2} C {1,2,3} but {2} 0 {1,2,3}. The point is that 
the number 2 and the set {2} are distinct objects. It is important to 
distinguish sets from their elements, as they can have different proper- 
ties. For instance, it is possible to have an infinite set consisting of finite 
numbers (the set N of natural numbers is one such example), and it is 
also possible to have a finite set consisting of infinite objects (consider 
for instance the finite set {N, Z, Q, R}, which has four elements, all of 
which are infinite). 

We now give an axiom which easily allows us to create subsets out 
of larger sets. 

Axiom 3.5 (Axiom of specification). Let A be a set, and for each x G 
A, let P(x) be a property pertaining to x (i.e., P(x) is either a true 
statement or a false statement) . Then there exists a set, called {iGl: 
P(x) is true} (or simply {iGi: P(x)} for short), whose elements are 
precisely the elements x in A for which P(x) is true, hi other words, 
for any object y, 

y G {.x G A : P(x) is true } (y G A and P(y) is true). 

This axiom is also known as the axiom of separation. Note that 
{iGi: P(x) is true} is always a subset of A (why?), though it could 
be as large as A or as small as the empty set. One can verify that 
the axiom of substitution works for specification, thus if A = A 1 then 
{x G A : P(x)} = {x G A! : P(x)} (why?). 
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Example 3.1.22. Let S := {1, 2, 3, 4, 5}. Then the set {n € S : n < 4} 
is the set of those elements n in S for which n < 4 is true, i.e., {n € S : 
n < 4} = {1,2,3}. Similarly, the set {n £ S : n < 7} is the same as S 
itself, while {n € S : n < 1} is the empty set. 


We sometimes write {x € AjP(x)} instead of {x € A : P(x)}; this 
is useful when we are using the colon to denote something else, for 
instance to denote the range and domain of a function / : X — >• Y) . 

We can use this axiom of specification to define some further opera- 
tions on sets, namely intersections and difference sets. 


Definition 3.1.23 (Intersections). The intersection SiC\S 2 of two sets 
is defined to be the set 


Si n 52 :={xG5i :xes 2 }. 

In other words, S± n S 2 consists of all the elements which belong to both 
S i and S 2 . Thus, for all objects x, 

x € Si n S 2 x € S\ and x € S 2 . 

Remark 3.1.24. Note that this definition is well-defined (i.e., it obeys 
the axiom of substitution, see Section A. 7) because it is defined in terms 
of more primitive operations which were already known to obey the 
axiom of substitution. Similar remarks apply to future definitions in 
this chapter and will usually not be mentioned explicitly again. 

Examples 3.1.25. We have {1, 2, 4}n{2, 3, 4} = {2, 4}, {1, 2}n{3, 4} = 
0, {2, 3} U 0 = {2, 3}, and {2, 3} n 0 = 0. 

Remark 3.1.26. By the way, one should be careful with the English 
word “and” : rather confusingly, it can mean either union or intersection, 
depending on context. For instance, if one talks about a set of “boys and 
girls”, one means the union of a set of boys with a set of girls, but if one 
talks about the set of people who are single and male, then one means 
the intersection of the set of single people with the set of male people. 
(Can you work out the rule of grammar that determines when “and” 
means union and when “and” means intersection?) Another problem is 
that “and” is also used in English to denote addition, thus for instance 
one could say that “2 and 3 is 5” , while also saying that “the elements of 
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{2} and the elements of {3} form the set {2, 3}” and “the elements in {2} 
and {3} form the set 0”. This can certainly get confusing! One reason we 
resort to mathematical symbols instead of English words such as “and” 
is that mathematical symbols always have a precise and unambiguous 
meaning, whereas one must often look very carefully at the context in 
order to work out what an English word means. 

Two sets A, B are said to be disjoint if A 0 B = 0. Note that this 
is not the same concept as being distinct , A / B. For instance, the sets 
{1, 2, 3} and {2, 3, 4} are distinct (there are elements of one set which are 
not elements of the other) but not disjoint (because their intersection is 
non-empty). Meanwhile, the sets 0 and 0 are disjoint but not distinct 
(why?). 

Definition 3.1.27 (Difference sets). Given two sets A and B, we define 
the set A — B or A\B to be the set A with any elements of B removed: 

A\B := {x € A : x 0 B }; 

for instance, {1, 2, 3, 4}\{2, 4, 6} = {1,3}. In many cases B will be a 
subset of A, but not necessarily. 

We now give some basic properties of unions, intersections, and dif- 
ference sets. 

Proposition 3.1.28 (Sets form a boolean algebra). Let A, B, C be sets, 
and let X be a set containing A, B, C as subsets. 

(a) (Minimal element) We have A U 0 = A and A n 0 = 0. 

( b ) (Maximal element) We have A U X = X and A n X = A. 

(c) (Identity) We have A n A = A and A U A = A. 

( d ) ( Commutativity) We have Au B = B U A and A n B = B n A. 

(e) (Associativity) We have (A'OB)UC = A\J(BUC) and (AC\B)C\C = 
An(BnC). 

(/) (Distributivity) We have A n (B U C) = (4(15)11(4(1(7) and 
A u (B n c) = (A u B) n (A u c). 

(g) (Partition) We have A U (X\A) = X and A n (X\A) = 0. 
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( h ) (De Morgan laws) We have A\(A U B) = (X\A) n (. X\B ) and 
X\(A n B) = (X\A) U (X\B). 

Remark 3.1.29. The de Morgan laws are named after the logician 
Augustus De Morgan (1806-1871), who identified them as one of the 
basic laws of set theory. 

Proof. See Exercise 3.1.6. □ 

Remark 3.1.30. The reader may observe a certain symmetry in the 
above laws between U and n, and between X and 0. This is an example 
of duality - two distinct properties or objects being dual to each other. 
In this case, the duality is manifested by the complementation relation 
A i— >• X\A] the de Morgan laws assert that this relation converts unions 
into intersections and vice versa. (It also interchanges X and the empty 
set.) The above laws are collectively known as the laws of Boolean 
algebra , after the mathematician George Boole (1815-1864), and are 
also applicable to a number of other objects other than sets; it plays a 
particularly important role in logic. 

We have now accumulated a number of axioms and results about 
sets, but there are still many things we are not able to do yet. One of 
the basic things we wish to do with a set is take each of the objects of 
that set, and somehow transform each such object into a new object; for 
instance we may wish to start with a set of numbers, say {3,5,9}, and 
increment each one, creating a new set {4, 6, 10}. This is not something 
we can do directly using only the axioms we already have, so we need a 
new axiom: 

Axiom 3.6 (Replacement). Let A be a set. For any object x € A, and 
any object y, suppose we have a statement P(x,y) pertaining to x and 
y, such that for each x € A there is at most one y for which P(x, y) is 
true. Then there exists a set {y : P(x,y) is true for some x € A}, such 
that for any object z, 

z €{y : P(x, y) is true for some x € A} 

<*=>■ P(x, z) is true for some x € A. 

Example 3.1.31. Let A := {3,5,9}, and let P(x,y) be the statement 
y = x++, i.e. , y is the successor of x. Observe that for every x € A, there 
is exactly one y for which P(x, y) is true - specifically, the successor of 
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x. Thus the above axiom asserts that the set {y : y = x++ for some x € 
{3, 5, 9}} exists; in this case, it is clearly the same set as {4, 6, 10} (why?). 

Example 3.1.32. Let A = {3,5,9}, and let P(x,y) be the state- 
ment y = 1. Then again for every x € A, there is exactly one y 
for which P(x,y) is true - specifically, the number 1. In this case 
{y : y = 1 for some x € {3, 5, 9}} is just the singleton set {1}; we have 
replaced each element 3,5,9 of the original set A by the same object, 
namely 1. Thus this rather silly example shows that the set obtained by 
the above axiom can be “smaller” than the original set. 

We often abbreviate a set of the form 

{y : y = f(x) for some x € A} 

as {/(x) : x £ A} or {/(x) x € A}. Thus for instance, if A = {3,5,9}, 
then {x-H- : x € A} is the set {4,6, 10}. We can of course combine the 
axiom of replacement with the axiom of specification, thus for instance 
we can create sets such as {/(x) P(x) is true} by starting with 

the set A, using the axiom of specification to create the set {x € A : 
P(x) is true}, and then applying the axiom of replacement to create 
{/(x) : x € A; P(x) is true}. Thus for instance {n-H- : n £ {3, 5, 9}; n < 
6} = {4,6}. 

In many of our examples we have implicitly assumed that natural 
numbers are in fact objects. Let us formalize this as follows. 

Axiom 3.7 (Infinity). There exists a set N, whose elements are called 
natural numbers, as well as an object 0 in N, and an object n++ assigned 
to every natural number n € N, such that the Peano axioms ( Axioms 
2.1 - 2.5 ) hold. 

This is the more formal version of Assumption 2.6. It is called the 
axiom of infinity because it introduces the most basic example of an 
infinite set, namely the set of natural numbers N. (We will formalize 
what finite and infinite mean in Section 3.6.) From the axiom of infinity 
we see that numbers such as 3, 5, 7, etc. are indeed objects in set theory, 
and so (from the pair set axiom and pairwise union axiom) we can indeed 
legitimately construct sets such as {3, 5, 9} as we have been doing in our 
examples. 

One has to keep the concept of a set distinct from the elements of 
that set; for instance, the set {n + 3 : n € N, 0 < n < 5} is not the same 
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thing as the expression or function n + 3. We emphasize this with an 
example: 

Example 3.1.33. (Informal) This example requires the notion of sub- 
traction, which has not yet been formally introduced. The following two 
sets are equal, 

{n + 3 : n € N, 0 < n < 5} = {8 — n : n G N, 0 < n < 5}, (3.1) 

(see below), even though the expressions n + 3 and 8 — n are never 
equal to each other for any natural number n. Thus, it is a good idea 
to remember to use those curly braces {} when you talk about sets, 
lest you accidentally confuse a set with its elements. One reason for 
this counter-intuitive situation is that the letter n is being used in two 
different ways on the two sides of (3.1). To clarify the situation, let us 
rewrite the set {8 — n : n € N, 0 < n < 5} by replacing the letter n by 
the letter m, thus giving {8 — m : m € N, 0 < m < 5}. This is exactly 
the same set as before (why?), so we can rewrite (3.1) as 

{n + 3 : n G N, 0 < n < 5} = {8 — m : m G N, 0 < m < 5}. 

Now it is easy to see (using (3.1.4)) why this identity is true: every 
number of the form n + 3, where n is a natural number between 0 and 
5, is also of the form 8 — m where m := 5 — n (note that m is therefore 
also a natural number between 0 and 5); conversely, every number of 
the form 8 — m, where rn is a natural number between 0 and 5, is also 
of the form n + 3, where n := 5 — m (note that n is therefore a natural 
number between 0 and 5). Observe how much more confusing the above 
explanation of (3.1) would have been if we had not changed one of the 
n’s to an m first! 


— Exercises — 

Exercise 3.1.1. Show that the definition of equality in Definition 3.1.4 is reflex- 
ive, symmetric, and transitive. 

Exercise 3.1.2. Using only Definition 3.1.4, Axiom 3.1, Axiom 3.2, and Axiom 
3.3, prove that the sets 0, {0}, {{0}}, and {0, {0}} are all distinct (i.e. , no two 
of them are equal to each other). 

Exercise 3.1.3. Prove the remaining claims in Lemma 3.1.13. 

Exercise 3.1.4. Prove the remaining claims in Proposition 3.1.18. 

Exercise 3.1.5. Let A, B be sets. Show that the three statements A C B, 
All B = B, Ad B = A are logically equivalent (any one of them implies the 
other two). 



46 


3. Set theory 


Exercise 3.1.6. Prove Proposition 3.1.28. (Hint: one can use some of these 
claims to prove others. Some of the claims have also appeared previously in 
Lemma 3.1.13.) 

Exercise 3.1.7. Let A,B,C be sets. Show that A n B C A and A n B C B. 
Furthermore, show that C C A and C C B if and only if C C A fl B. In a 
similar spirit, show that A C A U B and B C A U B, and furthermore that 
ACC and B C C if and only if A U B C C. 

Exercise 3.1.8. Let A. B be sets. Prove the absorption laws A n {A U B) = A 
and A U (A fl B) = A. 

Exercise 3.1.9. Let A,B,X be sets such that A\JB = X and AnB = 0. Show 
that A = X\B and B = A\A 

Exercise 3.1.10. Let A and B be sets. Show that the three sets A\B, An B, 
and B\A are disjoint, and that their union is A U B. 

Exercise 3.1.11. Show that the axiom of replacement implies the axiom of 
specification. 


3.2 Russell’s paradox (Optional) 

Many of the axioms introduced in the previous section have a similar 
flavor: they both allow us to form a set consisting of all the elements 
which have a certain property. They are both plausible, but one might 
think that they could be unified, for instance by introducing the follow- 
ing axiom: 

Axiom 3.8 (Universal specification). ( Dangerous /) Suppose for every 
object x we have a property P(x) pertaining to x (so that for every x, 
P(x) is either a true statement or a false statement) . Then there exists 
a set {.x : P(x) is true} such that for every object y, 

y G {x : P(x) is true } <^=t- P(y) is true. 

This axiom is also known as the axiom of comprehension. It asserts 
that every property corresponds to a set; if we assumed that axiom, 
we could talk about the set of all blue objects, the set of all natural 
numbers, the set of all sets, and so forth. This axiom also implies most of 
the axioms in the previous section (Exercise 3.2.1). Unfortunately, this 
axiom cannot be introduced into set theory, because it creates a logical 
contradiction known as Russell’s paradox , discovered by the philosopher 
and logician Bertrand Russell (1872-1970) in 1901. The paradox runs 
as follows. Let P(x) be the statement 


P(x) 


'x is a set, and x 0 x”; 
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i.e., P(x) is true only when x is a set which does not contain itself. For 
instance, P({ 2, 3, 4}) is true, since the set {2, 3, 4} is not one of the three 
elements 2, 3, 4 of {2,3,4}. On the other hand, if we let S be the set 
of all sets (which we would know to exist from the axiom of universal 
specification), then since S is itself a set, it is an element of S, and so 
P(S) is false. Now use the axiom of universal specification to create the 
set 

14 := {x : P(x) is true} = {x : x is a set and x 0 x}, 

i.e., the set of all sets which do not contain themselves. Now ask the 
question: does 14 contain itself, i.e. is 14 € 14? If 14 did contain itself, 
then by definition this means that P{ 14) is true, i.e., 14 is a set and 
14 0 14. On the other hand, if 14 did not contain itself, then P( 14) would 
be true, and hence 14 £ 14. Thus in either case we have both 14 G 14 and 
14 0 14, which is absurd. 

The problem with the above axiom is that it creates sets which are far 
too “large” - for instance, we can use that axiom to talk about the set of 
all objects (a so-called “universal set”). Since sets are themselves objects 
(Axiom 3.1), this means that sets are allowed to contain themselves, 
which is a somewhat silly state of affairs. One way to informally resolve 
this issue is to think of objects as being arranged in a hierarchy. At the 
bottom of the hierarchy are the primitive objects - the objects that are 
not sets 1 , such as the natural number 37. Then on the next rung of the 
hierarchy there are sets whose elements consist only of primitive objects, 
such as {3, 4, 7} or the empty set 0; let’s call these “primitive sets” for 
now. Then there are sets whose elements consist only of primitive objects 
and primitive sets, such as {3, 4, 7, {3, 4, 7}}. Then we can form sets out 
of these objects, and so forth. The point is that at each stage of the 
hierarchy we only see sets whose elements consist of objects at lower 
stages of the hierarchy, and so at no stage do we ever construct a set 
which contains itself. 

To actually formalize the above intuition of a hierarchy of objects 
is actually rather complicated, and we will not do so here. Instead, we 
shall simply postulate an axiom which ensures that absurdities such as 
Russell’s paradox do not occur. 

Axiom 3.9 (Regularity). If A is a non-empty set, then there is at least 
one element x of A which is either not a set, or is disjoint from A. 


1 In pure set theory, there will be no primitive objects, but there will be one 
primitive set 0 on the next rung of the hierarchy. 
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The point of this axiom (which is also known as the axiom of foun- 
dation) is that it is asserting that at least one of the elements of A is so 
low on the hierarchy of objects that it does not contain any of the other 
elements of A. For instance, if A = {{3, 4}, {3, 4, {3, 4}}}, then the ele- 
ment {3, 4} £ A does not contain any of the elements of A (neither 3 nor 
4 lies in A), although the element {3, 4, {3,4}}, being somewhat higher 
in the hierarchy, does contain an element of A, namely {3,4}. One par- 
ticular consequence of this axiom is that sets are no longer allowed to 
contain themselves (Exercise 3.2.2). 

One can legitimately ask whether we really need this axiom in our 
set theory, as it is certainly less intuitive than our other axioms. For 
the purposes of doing analysis, it turns out in fact that this axiom is 
never needed; all the sets we consider in analysis are typically very low 
on the hierarchy of objects, for instance being sets of primitive objects, 
or sets of sets of primitive objects, or at worst sets of sets of sets of 
primitive objects. However it is necessary to include this axiom in order 
to perform more advanced set theory, and so we have included this axiom 
in the text (but in an optional section) for sake of completeness. 


— Exercises — 

Exercise 3.2.1. Show that the universal specification axiom, Axiom 3.8, if as- 
sumed to be true, would imply Axioms 3.2, 3.3, 3.4, 3.5, and 3.6. (If we assume 
that all natural numbers are objects, we also obtain Axiom 3.7.) Thus, this 
axiom, if permitted, would simplify the foundations of set theory tremendously 
(and can be viewed as one basis for an intuitive model of set theory known as 
“naive set theory”). Unfortunately, as we have seen, Axiom 3.8 is “too good 
to be true”! 

Exercise 3.2.2. Use the axiom of regularity (and the singleton set axiom) to 
show that if A is a set, then A A. Furthermore, show that if A and B are 
two sets, then either A B or B qL A (or both). 

Exercise 3.2.3. Show (assuming the other axioms of set theory) that the uni- 
versal specification axiom, Axiom 3.8, is equivalent to an axiom postulating 
the existence of a “universal set” fl consisting of all objects (i.e. , for all objects 
x, we have x £ U). In other words, if Axiom 3.8 is true, then a universal set ex- 
ists, and conversely, if a universal set exists, then Axiom 3.8 is true. (This may 
explain why Axiom 3.8 is called the axiom of universal specification). Note 
that if a universal set Q existed, then we would have fl £ Q by Axiom 3.1, 
contradicting Exercise 3.2.2. Thus the axiom of foundation specifically rules 
out the axiom of universal specification. 
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3.3 Functions 

In order to do analysis, it is not particularly useful to just have the 
notion of a set; we also need the notion of a function from one set to 
another. Informally, a function / : X — >• Y from one set X to another 
set Y is an operation which assigns to each element (or “input”) x in 
X, a single element (or “output”) f{x) in Y ; we have already used this 
informal concept in the previous chapter when we discussed the natural 
numbers. The formal definition is as follows. 

Definition 3.3.1 (Functions). Let X,Y be sets, and let P(x,y) be a 
property pertaining to an object and an object y € Y, such that 

for every x € X, there is exactly one y £ Y for which P(x,y ) is true 
(this is sometimes known as the vertical line test). Then we define the 
function f : X — >• Y defined by P on the domain X and range Y to be 
the object which, given any input x € X, assigns an output f(x) € Y, 
defined to be the unique object f(x) for which P(x, f(x)) is true. Thus, 
for any x € X and y € T, 

y = f(x) P(x,y) is true. 

Functions are also referred to as maps or transformations , depending 
on the context. They are also sometimes called morphisms, although to 
be more precise, a morphism refers to a more general class of object, 
which may or may not correspond to actual functions, depending on the 
context. 

Example 3.3.2. Let X = N, Y = N, and let P(x,y ) be the property 
that y = x++. Then for each x € N there is exactly one y for which 
P(x,y) is true, namely y = x++. Thus we can define a function / : N — >• 
N associated to this property, so that f{x) = x++ for all x; this is the 
increment function on N, which takes a natural number as input and 
returns its increment as output. Thus for instance /( 4) = 5, /(2n + 3) = 
2n + 4 and so forth. One might also hope to define a decrement function 
g : N — >• N associated to the property P(x,y ) defined by y++ = x , i.e., 
g(x) would be the number whose increment is x. Unfortunately this does 
not define a function, because when x = 0 there is no natural number 
y whose increment is equal to x (Axiom 2.3). On the other hand, we 
can legitimately define a decrement function h : N\{0} — >• N associated 
to the property P(x, y) defined by y-H- = x, because when x € N\{0} 
there is indeed exactly one natural number y such that y++ = x, thanks 
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to Lemma 2.2.10. Thus for instance h( 4) = 3 and h(2n + 3) = 2n + 2, 
but h{ 0) is undefined since 0 is not in the domain N\{0}. 

Example 3.3.3. (Informal) This example requires the real numbers R, 
which we will define in Chapter 5. One could try to define a square root 
function y : R -» R by associating it to the property P(x,y) defined 
by y 2 = x, i.e., we would want yfx to be the number y such that y 2 = x. 
Unfortunately there are two problems which prohibit this definition from 
actually creating a function. The first is that there exist real numbers 
x for which P(x, y) is never true, for instance if x = —1 then there is no 
real number y such that y 2 = x. This problem however can be solved 
by restricting the domain from R to the right half-line [0, +oo). The 
second problem is that even when x € [0, +oo), it is possible for there 
to be more than one y in the range R for which y 2 = x, for instance if 
x = 4 then both y = 2 and y = —2 obey the property P(x, y), i.e., both 
+2 and —2 are square roots of 4. This problem can however be solved 
by restricting the range of R to [0, +oo). Once one does this, then one 
can correctly define a square root function y : [0, +oo) — >• [0, +oo) using 
the relation y 2 = x, thus yfx is the unique number y € [0, +oo) such 
that y 2 = x. 

One common way to define a function is simply to specify its domain, 
its range, and how one generates the output f(x) from each input; this is 
known as an explicit definition of a function. For instance, the function 
/ in Example 3.3.2 could be defined explicitly by saying that / has 
domain and range equal to N, and f(x) := x+-\- for all x € N. In other 
cases we only define a function / by specifying what property P(x, y) 
links the input x with the output f(x); this is an implicit definition 
of a function. For instance, the square root function yfx in Example 
3.3.3 was defined implicitly by the relation ( yfx ) 2 = x. Note that an 
implicit definition is only valid if we know that for every input there is 
exactly one output which obeys the implicit relation. In many cases we 
omit specifying the domain and range of a function for brevity, and thus 
for instance we could refer to the function / in Example 3.3.2 as “the 
function f(x) := x++”, “the function x i->- X++”, “the function xT+”, 
or even the extremely abbreviated “++”. However, too much of this 
abbreviation can be dangerous; sometimes it is important to know what 
the domain and range of the function is. 

We observe that functions obey the axiom of substitution: if a: = x' , 
then f(x) = f(x') (why?). In other words, equal inputs imply equal 



3.3. Functions 


51 


outputs. On the other hand, unequal inputs do not necessarily ensure 
unequal outputs, as the following example shows: 

Example 3.3.4. Let X = N, Y = N, and let P(x,y ) be the property 
that y = 7. Then certainly for every x € N there is exactly one y 
for which P(x,y ) is true, namely the number 7. Thus we can create 
a function / : N — >• N associated to this property; it is simply the 
constant function which assigns the output of /(x) = 7 to each input 
i€N, Thus it is certainly possible for different inputs to generate the 
same output. 

Remark 3.3.5. We are now using parentheses () to denote several dif- 
ferent things in mathematics; on one hand, we are using them to clarify 
the order of operations (compare for instance 2 + (3 X 4) = 14 with 
(2 + 3) x 4 = 20), but on the other hand we also use parentheses to 
enclose the argument /(x) of a function or of a property such as P(x). 
However, the two usages of parentheses usually are unambiguous from 
context. For instance, if a is a number, then a(b + c) denotes the ex- 
pression ox (b + c), whereas if / is a function, then f(b + c) denotes 
the output of / when the input is b + c. Sometimes the argument of a 
function is denoted by subscripting instead of parentheses; for instance, 
a sequence of natural numbers 00 , 01 , 02 , 03 ,... is, strictly speaking, a 
function from N to N, but is denoted by n i->- a n rather than n >-)• a(n). 

Remark 3.3.6. Strictly speaking, functions are not sets, and sets are 
not functions; it does not make sense to ask whether an object x is 
an element of a function /, and it does not make sense to apply a 
set A to an input x to create an output A{x). On the other hand, 
it is possible to start with a function / : X — >• Y and construct its 
graph {(x, /(x)) : x € X}, which describes the function completely: see 
Section 3.5. 

We now define some basic concepts and notions for functions. The 
first notion is that of equality. 

Definition 3.3.7 (Equality of functions). Two functions / : X — > Y, 
g : X — >• Y with the same domain and range are said to be equal , f = g, 
if and only if /(x) = g(x) for all x € X. (If /(x) and g(x) agree for 
some values of x, but not others, then we do not consider / and g to be 
equal 2 .) 

2 In Chapter 11.45, we shall introduce a weaker notion of equality, that of two 
functions being equal almost everywhere. 
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Example 3.3.8. The functions x i->- x 2 + 2x + 1 and (x + l) 2 are 
equal on the domain R. The functions x i — > x and x H > |x| are equal 
on the positive real axis, but are not equal on R; thus the concept of 
equality of functions can depend on the choice of domain. 

Example 3.3.9. A rather boring example of a function is the empty 
function f : 0 — > A from the empty set to an arbitrary set A. Since the 
empty set has no elements, we do not need to specify what / does to any 
input. Nevertheless, just as the empty set is a set, the empty function 
is a function, albeit not a particularly interesting one. Note that for 
each set X, there is only one function from 0 to A, since Definition 3.3.7 
asserts that all functions from 0 to A are equal (why?). 

This notion of equality obeys the usual axioms (Exercise 3.3.1). 

A fundamental operation available for functions is composition. 

Definition 3.3.10 (Composition). Let f : X Y and g : Y — >• Z be 
two functions, such that the range of / is the same set as the domain of 
g. We then define the composition g o f : X — >• Z of the two functions g 
and / to be the function defined explicitly by the formula 

{go f){x) :=g{f{x)). 

If the range of / does not match the domain of g, we leave the compo- 
sition g o f undefined. 

It is easy to check that composition obeys the axiom of substitution 
(Exercise 3.3.1). 

Example 3.3.11. Let / : N — >• N be the function f(n) := 2 n, and let 
g : N — >• N be the function g{n ) := n + 3. Then g o f is the function 

g o /( n ) = g{f{n)) = g{2n) = 2 n + 3, 

thus for instance g o /( 1) = 5, g o /( 2) = 7, and so forth. Meanwhile, 
/ o g is the function 

/ ° 9(n) = f(g(n)) = f (to + 3) = 2 (n + 3) = 2n + 6, 


thus for instance / o g(l) = 8, / o g{2) = 10, and so forth. 
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The above example shows that composition is not commutative: fog 
and g o f are not necessarily the same function. However, composition 
is still associative: 

Lemma 3.3.12 (Composition is associative). Let f : Z — >• W , g : Y — >• 
Z, and h : X — >• Y be functions. Then f ° (g o h) = (f o g) o h. 

Proof. Since goh is a function from X to Z, f o{g oh) is a function from 
X to W . Similarly / o g is a function from Y to W, and hence (/ o g) o h 
is a function from X to W . Thus / o (goh) and (/ °g)°h have the same 
domain and range. In order to check that they are equal, we see from 
Definition 3.3.7 that we have to verify that {fo(goh))(x) = (( f°g)oh)(x ) 
for all x € X. But by Definition 3.3.10 

(f°(goh))(x) = f((goh)(x)) 

= f(g(h(x)) 

= ( f°9)(h(x )) 

= ((/ ° S') ° h){x) 


as desired. □ 

Remark 3.3.13. Note that while g appears to the left of / in the 
expression g o /, the function g o f applies the right-most function / 
first, before applying g. This is often confusing at first; it arises because 
we traditionally place a function / to the left of its input x rather than 
to the right. (There are some alternate mathematical notations in which 
the function is placed to the right of the input, thus we would write xf 
instead of f(x), but this notation has often proven to be more confusing 
than clarifying, and has not as yet become particularly popular.) 

We now describe certain special types of functions: one-to-one func- 
tions, onto functions, and invertible functions. 

Definition 3.3.14 (One-to-one functions). A function / is one-to-one 
(or injective) if different elements map to different elements: 

x^x' f{x) + f(x'). 

Equivalently, a function is one-to-one if 


/(*) = /O') 


x = x' . 
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Example 3.3.15. (Informal) The function / : Z — >• Z defined by 
f(n) := n 2 is not one-to-one because the distinct elements —1, 1 map 
to the same element 1. On the other hand, if we restrict this function 
to the natural numbers, defining the function g : N — > Z by g(n) := n 2 , 
then g is now a one-to-one function. Thus the notion of a one-to-one 
function depends not just on what the function does, but also what its 
domain is. 

Remark 3.3.16. If a function / : X — > Y is not one-to-one, then one 
can find distinct x and x' in the domain X such that f(x) = f(x'), thus 
one can find two inputs which map to one output. Because of this, we 
say that / is two-to-one instead of one-to-one. 

Definition 3.3.17 (Onto functions). A function / is onto (or surjective) 
if f(X) = Y, i.e., every element in Y comes from applying / to some 
element in X: 

For every y € Y, there exists x € X such that f(x ) = y. 

Example 3.3.18. (Informal) The function / : Z — >• Z defined by 
f(n) := n 2 is not onto because the negative numbers are not in the image 
of /. However, if we restrict the range Z to the set A := {n 2 : n € Z} 
of square numbers, then the function g : Z — > A defined by g(n) := n 2 
is now onto. Thus the notion of an onto function depends not just on 
what the function does, but also what its range is. 

Remark 3.3.19. The concepts of injectivity and surjectivity are in 
many ways dual to each other; see Exercises 3.3.2, 3.3.4, 3.3.5 for some 
evidence of this. 

Definition 3.3.20 (Bijective functions). Functions / : X Y which 
are both one-to-one and onto are also called bijective or invertible. 

Example 3.3.21. Let / : {0, 1,2} — > {3,4} be the function /( 0) := 3, 
/( 1) := 3, /( 2) := 4. This function is not bijective because if we set 
y = 3, then there is more than one x in {0, 1, 2} such that f(x) = y (this 
is a failure of injectivity). Now let g : {0, 1} — >• {2,3,4} be the function 
g( 0) := 2, g(l) := 3; then g is not bijective because if we set y = 4, 
then there is no x for which g(x) = y (this is a failure of surjectivity). 
Now let h : {0,1,2} — > {3,4,5} be the function h( 0) := 3, h( 1) := 4, 
h( 2) := 5. Then h is bijective, because each of the elements 3, 4, 5 comes 
from exactly one element from 0, 1, 2. 
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Example 3.3.22. The function / : N — >• N\{0} defined by f(n) : = 
n- H- is a bijection (in fact, this fact is simply restating Axioms 2.2, 2.3, 
2.4). On the other hand, the function g : N — >• N defined by the same 
definition g(n) := n++ is not a bijection. Thus the notion of a bijective 
function depends not just on what the function does, but also what its 
range (and domain) are. 

Remark 3.3.23. If a function x H > f(x) is bijective, then we sometimes 
call / a perfect matching or a one-to-one correspondence (not to be 
confused with the notion of a one-to-one function) , and denote the action 
of / using the notation x f(x) instead of x f(x). Thus for instance 
the function h in the above example is the one-to-one correspondence 
0 <-> 3, 1 <-> 4, 2 5. 

Remark 3.3.24. A common error is to say that a function / : X — >• Y 
is bijective iff “for every x in X, there is exactly one y in Y such that 
y = f(x).” This is not what it means for / to be bijective; rather, this is 
merely stating what it means for / to be a function. A function cannot 
map one element to two different elements, for instance one cannot have 
a function / for which /( 0) = 1 and also /( 0) = 2. The functions 
/, g given in the previous example are not bijective, but they are still 
functions, since each input still gives exactly one output. 

If / is bijective, then for every y € Y, there is exactly one x such 
that f(x) = y (there is at least one because of surjectivity, and at most 
one because of injectivity). This value of x is denoted / -1 (y); thus / -1 
is a function from Y to X. We call / -1 the inverse of /. 

— Exercises — 

Exercise 3.3.1. Show that the definition of equality in Definition 3.3.7 is re- 
flexive, symmetric, and transitive. Also verify the substitution property: if 
f,f:X—>Y and g,g : Y — > Z are functions such that / = / and g = g, then 
9 ° f = 9 ° /• 

Exercise 3.3.2. Let / : X — >■ Y and g : Y — > Z be functions. Show that if / 
and g are both injective, then so is g o /; similarly, show that if / and g are 
both surjective, then so is g o f. 

Exercise 3.3.3. When is the empty function injective? surjective? bijective? 

Exercise 3.3.4. In this section we give some cancellation laws for composition. 
Let / : X — » Y , / : X — > Y , g : Y — > Z, and g : Y — > Z be functions. Show 
that ifgof = gof and g is injective, then f = f. Is the same statement true 
if g is not injective? Show that tfgof = gof and / is surjective, then g = g. 
Is the same statement true if / is not surjective? 
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Exercise 3.3.5. Let / : X Y and g : Y — > Z be functions. Show that if g o / 
is injective, then / must be injective. Is it true that g must also be injective? 
Show that if go f is surjective, then g must be surjective. Is it true that / must 
also be surjective? 

Exercise 3.3.6. Let / : X — > Y be a bijective function, and let / -1 : Y — > X 
be its inverse. Verify the cancellation laws f~ 1 (f(x)) = x for all x £ X and 
f{f~ 1 {y)) = V f° r all y &Y . Conclude that / -1 is also invertible, and has / 
as its inverse (thus (/ _1 ) _1 = /). 

Exercise 3.3.7. Let / : X — >■ Y and g : Y — > Z be functions. Show that if / 
and g are bijective, then so is g o /, and we have {g o /) _1 = / _1 o g —1 . 

Exercise 3.3.8. If X is a subset of Y, let lx^>y : X — > Y be the inclusion map 
from X to V, defined by mapping x K > x for all x € X, i.e. , lx->y(x) := x for 
all x £ X. The map lx-*x is in particular called the identity map on X. 

(a) Show that if X C Y C Z then ° i*x->y = i-x^z- 

(b) Show that if / : A — > B is any function, then / = / o la-^a = ° /• 

(c) Show that, if / : A — > B is a bijective function, then / o / -1 = lb^b 
and / -1 o / = i A _+ A . 

(d) Show that if X and Y are disjoint sets, and / : X — ► Z and g : Y — ► Z 
are functions, then there is a unique function h : X U Y — > Z such that 
h o t,x->xuY = f and h o Ly->x\jy = g- 

3.4 Images and inverse images 

We know that a function / : X — >• Y from a set X to a set Y can take 
individual elements x € X to elements f(x) € Y. Functions can also 
take subsets in X to subsets in Y : 

Definition 3.4.1 (Images of sets). If / : X — > Y is a function from X 
to Y. and S is a set in X, we define f(S) to be the set 

f(S) := {f(x) : x <E S'}; 

this set is a subset of Y, and is sometimes called the image of S under the 
map /. We sometimes call f(S) the forward image of S to distinguish 
it from the concept of the inverse image f~ 1 (S) of S, which is defined 
below. 

Note that the set f(S) is well-defined thanks to the axiom of re- 
placement (Axiom 3.6). One can also define f(S) using the axiom of 
specification (Axiom 3.5) instead of replacement, but we leave this as a 
challenge to the reader. 
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Example 3.4.2. If / : N — * N is the map f{x) = 2x, then the forward 
image of {1, 2, 3} is {2, 4, 6}: 

/({1,2,3}) = {2,4,6}. 


More informally, to compute /(S'), we take every element x of S, and 
apply / to each element individually, and then put all the resulting 
objects together to form a new set. 


In the above example, the image had the same size as the original set. 
But sometimes the image can be smaller, because / is not one-to-one 
(see Definition 3.3.14): 

Example 3.4.3. (Informal) Let Z be the set of integers (which we will 
define rigorously in the next section) and let / : Z — >• Z be the map 
f(x) = x 2 , then 

/({- 1,0, 1,2}) = {0,1,4}. 

Note that / is not one-to-one because /(— 1) = /( 1). 


Note that 

x G S =>- f(x) € f(S) 


but in general 


f(x) G f(S) ^ x G S; 


for instance in the above informal example, /(— 2) lies in the set 
/({— 1, 0, 1, 2}), but —2 is not in {—1,0, 1,2}. The correct statement 
is 


y e f(S) 


y = fix) for some x € S 


(why?). 


Definition 3.4.4 (Inverse images). If U is a subset of Y . we define the 
set /” 1 (L r ) to be the set 


f-\U) := {x € X : f(x) G U}. 


In other words, / 1 (U) consists of all the elements of X which map into 
U: 

f(x) G U <=> xef-^U). 

We call f~ 1 (U) the inverse image of U. 
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Example 3.4.5. If / : N — > N is the map f(x) = 2x , then /({ 1, 2, 3}) = 
{2,4,6}, but / -1 ({1,2,3}) = {1}. Thus the forward image of {1,2,3} 
and the backwards image of {1,2,3} are quite different sets. Also note 
that 

/(/ _1 ({ 1,2,3})) ^{1,2,3} 

(why?). 

Example 3.4.6. (Informal) If / : Z — > Z is the map f(x) = x 2 , then 

/ -1 ({ 0 , 1 , 4 }) = {- 2 ,- 1 , 0 , 1 , 2 }. 

Note that / does not have to be invertible in order for f~ 1 (U) to make 
sense. Also note that images and inverse images do not quite invert each 
other, for instance we have 

r\m- i,o, 1,2}))/ {-i,o, 1,2} 


(why?). 

Remark 3.4.7. If / is a bijective function, then we have defined / -1 
in two slightly different ways, but this is not an issue because both 
definitions are equivalent (Exercise 3.4.1). 

As remarked earlier, functions are not sets. However, we do consider 
functions to be a type of object, and in particular we should be able to 
consider sets of functions. In particular, we should be able to consider 
the set of all functions from a set X to a set Y. To do this we need to 
introduce another axiom to set theory: 

Axiom 3.10 (Power set axiom). Let X and Y be sets. Then there exists 
a set, denoted Y x , which consists of all the functions from X to Y, thus 

f £ Y x (/ is a function with domain X and range Y). 

Example 3.4.8. Let X = {4,7} and Y = {0,1}. Then the set Y x 
consists of four functions: the function that maps 4 i — ^ 0 and 7 H > 0; the 
function that maps 4 i — > 0 and 7 i — > 1; the function that maps 4 4 1 
and 7 H > 0; and the function that maps 4 i — >■ 1 and 7 i — > 1. The reason 
we use the notation Y x to denote this set is that if Y has n elements 
and X has m elements, then one can show that Y x has n m elements; 
see Proposition 3.6.14(f). 
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One consequence of this axiom is 
Lemma 3.4.9. Let X be a set. Then the set 

{Y : Y is a subset of X} 


is a set. 

Proof. See Exercise 3.4.6. □ 

Remark 3.4.10. The set {Y : Y is a subset of X} is known as the 
power set of X and is denoted 2 X . For instance, if a, b, c are distinct 
objects, we have 

2 {“’ 6 ’ c} = {0, {a}, {&}, {c}, {a, 6}, {a, c}, {b, c}, {a, b, c}}. 

Note that while {a, b, c} has 3 elements, 2^ a,b ’ c ^ has 2 3 = 8 elements. 
This gives a hint as to why we refer to the power set of X as 2 A '; we 
return to this issue in Chapter 8. 

For sake of completeness, let us now add one further axiom to our 
set theory, in which we enhance the axiom of pairwise union to allow 
unions of much larger collections of sets. 

Axiom 3.11 (Union). Let A be a set, all of whose elements are them- 
selves sets. Then there exists a set (J A whose elements are precisely 
those objects which are elements of the elements of A, thus for all ob- 
jects x 

re € |^J A (x € S for some S G A). 

Example 3.4.11. If A = {{2, 3}, {3, 4}, {4, 5}}, then (J A = {2, 3, 4, 5} 
(why?). 

The axiom of union, combined with the axiom of pair set, implies 
the axiom of pairwise union (Exercise 3.4.8). Another important conse- 
quence of this axiom is that if one has some set I, and for every element 
a £ I we have some set A a , then we can form the union set (J a ei by 
defining 

[^J A a := |^J{A a : a € /}, 

aei 

which is a set thanks to the axiom of replacement and the axiom of 
union. Thus for instance, if I = {1, 2, 3}, A\ := {2, 3}, A -2 := {3,4}, and 
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A 3 := {4,5}, then Uae{i 2 3 }^« = {2, 3, 4, 5}. More generally, we see 
that for any object y, 

y € |^J A a (y € A a for some a € I). (3-2) 

a£l 

In situations like this, we often refer to I as an index set, and the elements 
a of this index set as labels ; the sets A a are then called a family of sets, 
and are indexed by the labels a € I. Note that if / was empty, then 
U a&i Aa would automatically also be empty (why?). 

We can similarly form intersections of families of sets, as long as the 
index set is non-empty. More specifically, given any non-empty set /, 
and given an assignment of a set A a to each a G 1, we can define the 
intersection ("}«£/ by first choosing some element f3 of I (which we 
can do since I is non-empty), and setting 

P| A a := {x € Ap : x € A a for all a € I}, (3-3) 

aei 

which is a set by the axiom of specification. This definition may look like 
it depends on the choice of (3, but it does not (Exercise 3.4.9). Observe 
that for any object y, 

y € P A a (y G A a for all a € I) (3-4) 

ce£l 

(compare with (3.2)). 

Remark 3.4.12. The axioms of set theory that we have introduced 
(Axioms 3.1-3.11, excluding the dangerous Axiom 3.8) are known as 
th eZermelo-Fraenkel axioms of set theory 3 , after Ernest Zermelo (1871- 
1953) and Abraham Fraenkel (1891-1965). There is one further axiom 
we will eventually need, the famous axiom of choice (see Section 8.4), 
giving rise to the Zermelo-Fraenkel- Choice ( ZFC ) axioms of set theory, 
but we will not need this axiom for some time. 

— Exercises — 

Exercise 3.4.1. Let / : X — > Y be a bijective function, and let / _1 : Y — ► X 
be its inverse. Let V be any subset of Y . Prove that the forward image of V 
under / _1 is the same set as the inverse image of V under /; thus the fact that 
both sets are denoted by f~ 1 {V) will not lead to any inconsistency. 


3 These axioms are formulated slightly differently in other texts, but all the formu- 
lations can be shown to be equivalent to each other. 
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Exercise 3.4.2. Let / : X — > Y be a function from one set X to another set Y, 
let S be a subset of X, and let U be a subset of Y . What, in general, can one 
say about f~ 1 (f(S)) and S? What about f{f~ x {U)) and U1 
Exercise 3.4.3. Let A, B be two subsets of a set X , and let / : X — > Y be 
a function. Show that f{A fl B) C f(A) D f(B), that f(A)\f(B) C f(A\B), 
f(A U B) = f(A) U f(B). For the first two statements, is it true that the C 
relation can be improved to =? 

Exercise 3.4.4. Let / : X — ► Y be a function from one set X to another set Y . 
and let U, V be subsets of Y. Show that f~ x {U U V) = f~ 1 {U) U / _1 (H), that 
f-\U nv) = f-\U) n and that f-\U\V ) = 

Exercise 3.4.5. Let / : X — > Y be a function from one set X to another set Y . 
Show that )) = S for every S' C Y if and only if / is surjective. Show 

that f~ 1 (f(S)) = S for every S C X if and only if f is injective. 

Exercise 3.4.6. Prove Lemma 3.4.9. (Hint: start with the set {0, 1} A and apply 
the replacement axiom, replacing each function / with the object / _1 ({!})•) 
See also Exercise 3.5.11. 

Exercise 3.4.7. Let X,Y be sets. Define a partial function from X to Y to 
be any function / : X' — » Y' whose domain X' is a subset of X, and whose 
range Y' is a subset of Y. Show that the collection of all partial functions 
from X to Y is itself a set. (Hint: use Exercise 3.4.6, the power set axiom, the 
replacement axiom, and the union axiom.) 

Exercise 3.4.8. Show that Axiom 3.4 can be deduced from Axiom 3.1, Axiom 
3.3 and Axiom 3.11. 

Exercise 3.4.9. Show that if (3 and /?' are two elements of a set I, and to each 
a € I we assign a set A a , then 

for all a € 1} — {x G Ap> : x £ A a for all a € I}, 

and so the definition of Plae/ A a defined in (3.3) does not depend on /?. Also 
explain why (3.4) is true. 

Exercise 3.4.10. Suppose that I and J are two sets, and for all a € I U J let 
A a be a set. Show that (U a e/ Aa) u (Uae J Aa) ~ UaS/UJ A a . If I and J are 
non-empty, show that (Plae/ 'Aa) n (Pae J A a ) = flaS/UJ A a- 
Exercise 3.4.11. Let A be a set, let I be a non-empty set, and for all a € I let 
A a be a subset of X. Show that 

*\ u = n ( «) 

a£l a£l 

and 

*\ fl Aa = U (X\Aa). 

a£l a£l 

This should be compared with de Morgan’s laws in Proposition 3.1.28 (although 
one cannot derive the above identities directly from de Morgan’s laws, as / could 
be infinite). 
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3.5 Cartesian products 

In addition to the basic operations of union, intersection, and differ- 
encing, another fundamental operation on sets is that of the Cartesian 
product. 

Definition 3.5.1 (Ordered pair). If x and y are any objects (possibly 
equal), we define the ordered pair (x,y) to be a new object, consisting 
of x as its first component and y as its second component. Two ordered 
pairs (x,y) and (x 1 , y') are considered equal if and only if both their 
components match, i.e. 

(x, y) = (x', y') <*=*► (x = x' and y = y'). (3.5) 

This obeys the usual axioms of equality (Exercise 3.5.3). Thus for in- 
stance, the pair (3, 5) is equal to the pair (2 + 1, 3 + 2), but is distinct 
from the pairs (5, 3), (3, 3), and (2, 5). (This is in contrast to sets, where 
{3,5} and {5,3} are equal.) 

Remark 3.5.2. Strictly speaking, this definition is partly an axiom, 
because we have simply postulated that given any two objects x and y, 
that an object of the form (x, y) exists. However, it is possible to define 
an ordered pair using the axioms of set theory in such a way that we do 
not need any further postulates (see Exercise 3.5.1). 

Remark 3.5.3. We have now “overloaded” the parenthesis symbols () 
once again; they now are not only used to denote grouping of operators 
and arguments of functions, but also to enclose ordered pairs. This is 
usually not a problem in practice as one can still determine what usage 
the symbols () were intended for from context. 

Definition 3.5.4 (Cartesian product). If X and Y are sets, then we 
define the Cartesian product X xY to be the collection of ordered pairs, 
whose first component lies in X and second component lies in Y, thus 

X xY = {(x,y) :xeX,yeY} 


or equivalently 

a € (X x Y) (a = (x, y) for some and y € Y"). 

Remark 3.5.5. We shall simply assume that our notion of ordered pair 
is such that whenever X and Y are sets, the Cartesian product X x Y is 
also a set. This is however not a problem in practice; see Exercise 3.5.1. 
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Example 3.5.6. If X := {1,2} and Y := {3,4,5}, then 

XxY = {(1,3), (1,4), (1,5), (2, 3), (2, 4), (2, 5)} 


and 

YxX = {(3,1), (4, 1), (5, 1), (3, 2), (4, 2), (5, 2)}. 

Thus, strictly speaking, XxY and YxX are different sets, although 
they are very similar. For instance, they always have the same number 
of elements (Exercise 3.6.5). 

Let / : X xY — > Z be a function whose domain X x Y is a Cartesian 
product of two other sets X and Y. Then / can either be thought of 
as a function of one variable, mapping the single input of an ordered 
pair (x, y ) inlxh to an output f(x, y) in Z, or as a function of two 
variables, mapping an input and another input y G T to a single 

output f(x,y ) in Z. While the two notions are technically different, we 
will not bother to distinguish the two, and think of / simultaneously 
as a function of one variable with domain XxY and as a function of 
two variables with domains X and Y . Thus for instance the addition 
operation + on the natural numbers can now be re-interpreted as a 
function + : N x N — * N, defined by (x,y) e-x x + y. 

One can of course generalize the concept of ordered pairs to ordered 
triples, ordered quadruples, etc: 

Definition 3.5.7 (Ordered n-tuple and n-fold Cartesian product). Let 
n be a natural number. An ordered n-tuple (xj)i<j<„ (also denoted 
(aq, . . . ,x n )) is a collection of objects aq, one for every natural number 
i between 1 and n; we refer to Xi as the i th component of the ordered 
n-tuple. Two ordered n-tuples (aq)i<j< n and (yi)i<i< n are said to be 
equal iff Xi = y t for all 1 < i < n. If (-X*)i<i<n is an ordered n-tuple of 
sets, we define their Cartesian product Oi<i<n Ai (also denoted }I?=i A} 
or X\ x . . . x X n ) by 

} { Xi := {(xi)i<i< n : Xi € Xi for all 1 < i < n}. 

l<i<n 

Again, this definition simply postulates that an ordered n-tuple and 
a Cartesian product always exist when needed, but using the axioms of 
set theory one can explicitly construct these objects (Exercise 3.5.2). 

Remark 3.5.8. One can show that rii<j<n^* L indeed a set. Indeed, 
from the power set axiom we can consider the set of all functions i <— > x t 
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from the domain {1 < i < n} to the range Ui<i<n -^Q, an d th en we can 
restrict using the axiom of specification to restrict to those functions 
i i — ^ x t for which Xj E X* for all 1 < i < n. One can generalize this 
construction to infinite Cartesian products, see Definition 8 . 4 . 1 . 

Example 3 . 5 . 9 . Let 01,61,02,62,03,63 be objects, and let X\ := 
{ai, 61}, X2 : = {02, 62}, and X3 := {03, 63}. Then we have 

X\ X X2 X X3 = {(ai, 02, 03), (ai, 02 , 63), (ai, 62, 03), (oi, 62, 63), 

(61, 02, 03), (61, 02, 63), (61, 62, 03), (61, 62, 63)} 

PC X X 2 ) x X 3 = 

{((01, o 2 ), a 3 ),((oi, o 2 ), 63), ((01, 6 2 ), a 3 ), ((01, 6 2 ), 63), 

((61, o 2 ),a 3 ), ((61, a 2 ), 63), ((61, 62), 03), ((61, 62), 63)} 

*1 x (X 2 x X 3 ) = 

{(ai, (o2,a 3 )),(ai, (02,63)), (01, (62,03)), (ai, (62,63)), 

(61, (o 2 ,o 3 )), (61, (a 2 ,6 3 )), (61, (62, a 3 )), (61, (62,63))}. 

Thus, strictly speaking, the sets Ii x I2 x X3, (X\ x X2) x X3, and 
X\ x (X2 x X3) are distinct. However, they are clearly very related to 
each other (for instance, there are obvious Injections between any two 
of the three sets), and it is common in practice to neglect the minor 
distinctions between these sets and pretend that they are in fact equal. 
Thus a function / : X\ x X2 x X3 — > Y can be thought of as a function of 
one variable (xq ,X2,x 3 ) € X\ x X2 x X3 , or as a function of three variables 
xi € Xi, X2 € X2, X3 € X3, or as a function of two variables x\ E Xi, 
(x2,x 3 ) E X2 X X3, and so forth; we will not bother to distinguish 
between these different perspectives. 

Remark 3 . 5 . 10 . An ordered n-tuple xi , . . . , x n of objects is also called 
an ordered sequence of n elements, or a finite sequence for short. In 
Chapter 5 we shall also introduce the very useful concept of an infinite 
sequence. 

Example 3 . 5 . 11 . If x is an object, then ( x ) is a 1 -tuple, which we 
shall identify with x itself (even though the two are, strictly speaking, 
not the same object). Then if X\ is any set, then the Cartesian prod- 
uct IWi Xi is just X\ (why?). Also, the empty Cartesian product 
YhacoXi gives, not the empty set {}, but rather the singleton set {()} 
whose only element is the 0 -tuple (), also known as the empty tuple. 
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If n is a natural number, we often write X n as shorthand for the 
n-fold Cartesian product X n := \\ l<i<n X. Thus X 1 is essentially the 
same set as X (if we ignore the distinction between an object x and the 
1-tuple (x)), while X 2 is the Cartesian product X x X. The set X° is a 
singleton set {()} (why?). 

We can now generalize the single choice lemma (Lemma 3.1.6) to 
allow for multiple (but finite) number of choices. 

Lemma 3.5.12 (Finite choice). Let n > 1 be a natural number, and 
for each natural number 1 < i < n, let X, be a non-empty set. Then 
there exists an n-tuple (. Xi)\<i< n such that Xi € X,; for all 1 < i < n. 
In other words, if each Xi is non-empty, then the set rii<i<n^* a ^ so 
non-empty. 

Proof. We induct on n (starting with the base case n = 1; the claim is 
also vacuously true with n = 0 but is not particularly interesting in that 
case). When n = 1 the claim follows from Lemma 3.1.6 (why?). Now 
suppose inductively that the claim has already been proven for some n; 
we will now prove it for nT+. Let X\, ... , X n++ be a collection of non- 
empty sets. By induction hypothesis, we can find an n-tuple (. *i)i<j< n 
such that Xi € X, for all 1 < i < n. Also, since X n++ is non-empty, by 
Lemma 3.1.6 we may find an object a such that a € X n++ . If we thus 
define the n-H— tuple (yi)i<i<n++ by setting yi := Xi when 1 < i < n 
and yi := a when i = n-H- it is clear that y t € X* for all 1 < i < n++, 
thus closing the induction. □ 

Remark 3.5.13. It is intuitively plausible that this lemma should be 
extended to allow for an infinite number of choices, but this cannot be 
done automatically; it requires an additional axiom, the axiom of choice. 
See Section 8.4. 


— Exercises — 

Exercise 3.5.1. Suppose we define the ordered pair ( x,y ) for any objects x 
and y by the formula (x,y) := {{a;}, {a:, y}} (thus using several applications of 
Axiom 3.3). Thus for instance (1,2) is the set {{1},{1,2}}, (2,1) is the set 
{{2}, {2,1}}, and (1,1) is the set {{1}}- Show that such a definition indeed 
obeys the property (3.5), and also whenever X and Y are sets, the Cartesian 
product X x Y is also a set. Thus this definition can be validly used as a 
definition of an ordered pair. For an additional challenge, show that the al- 
ternate definition (x,y) := {x,{x,y}} also verifies (3.5) and is thus also an 
acceptable definition of ordered pair. (For this latter task one needs the axiom 
of regularity, and in particular Exercise 3.2.2.) 
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Exercise 3.5.2. Suppose we define an ordered n-tuple to be a surjective function 
x : {i G N : 1 < i < n} — > X whose range is some arbitrary set X (so different 
ordered n-tuples are allowed to have different ranges); we then write Xi for 
x(i), and also write x as {xi)\<i< n . Using this definition, verify that we have 
(Xi)i<i< n = {yi)i<i< n if and only if Xi = y t for all 1 < i < n. Also, show 
that if are an ordered n-tuple of sets, then the Cartesian product, 

as defined in Definition 3.5.7, is indeed a set. (Hint: use Exercise 3.4.7 and the 
axiom of specification.) 

Exercise 3.5.3. Show that the definitions of equality for ordered pair and or- 
dered n-tuple obey the reflexivity, symmetry, and transitivity axioms. 

Exercise 3.5.4. Let A, B, C be sets. Show that Ax (BuC) = (Ax B)U(A x C), 
that A x (BflC) = (4xB)n(4x C), and that A x ( B\C ) = (4x B)\(A x C). 
(One can of course prove similar identities in which the roles of the left and 
right factors of the Cartesian product are reversed.) 

Exercise 3.5.5. Let A, B , C, D be sets. Show that (A x B) n (C x D) = (A n 
C) x (Bflfl). Is it true that (A x B) U (C x D) = (A U C) x (B U D)? Is it 
true that (A x B)\(C x D) = (A\C) x ( B\D)7 

Exercise 3.5.6. Let A, B, C, D be non-empty sets. Show that A x B C C x D 
if and only if A C C and BCD , and that AxB = Cxdif and only if 
A = C and B = D. What happens if the hypotheses that the A,B,C,D are 
all non-empty are removed? 

Exercise 3.5.7. Let X, Y be sets, and let n xxy^x ■ X x Y -)I and ttx x y->y : 
X xY — > Y be the maps 7 TxxY^x{x,y) := x and nx x y->y (x, y) := y, these 
maps are known as the co-ordinate functions on X x Y. Show that for any 
functions / : Z — x X and g : Z — x Y, there exists a unique function h : Z — x 
X xY such that ttxxy^x °h = f and ttxxy^-y °h = g. (Compare this to the 
last part of Exercise 3.3.8, and to Exercise 3.1.7.) This function h is known as 
the direct sum of / and g and is denoted h = f © g. 

Exercise 3.5.8. Let Xi. ... . X n be sets. Show that the Cartesian product 
n i= i X, is empty if and only if at least one of the Xi is empty. 

Exercise 3.5.9. Suppose that I and J are two sets, and for all a € I let A a be 
a set, and for all (3 G J let Bp be a set. Show that (U a g/ da) n (U/5g j Bp) — 

U(a,/3)e/x j(da n Bp). 

Exercise 3.5.10. If / : X — > Y is a function, define the graph of / to be the 
subset of X x Y defined by {(x,f(x)) : x G X}. Show that two functions 
/ : X —x y, / : X —x Y are equal if and only if they have the same graph. 
Conversely, if G is any subset of X x Y with the property that for each x G X, 
the set {y G Y : (x,y) G G} has exactly one element (or in other words, G 
obeys the vertical line test), show that there is exactly one function / : X — > Y 
whose graph is equal to G. 

Exercise 3.5.11. Show that Axiom 3.10 can in fact be deduced from Lemma 
3.4.9 and the other axioms of set theory, and thus Lemma 3.4.9 can be used 
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as an alternate formulation of the power set axiom. (Hint: for any two sets X 
and Y, use Lemma 3.4.9 and the axiom of specification to construct the set of 
all subsets of X x Y which obey the vertical line test. Then use Exercise 3.5.10 
and the axiom of replacement.) 

Exercise 3.5.12. This exercise will establish a rigorous version of Proposition 
2.1.16. Let / : N x N — > N be a function, and let c be a natural number. Show 
that there exists a function a : N — > N such that 

a(0) = c 


and 


a(n-H-) = f(n, a(n)) for all n £ N, 


and furthermore that this function is unique. (Hint: first show inductively, by 
a modification of the proof of Lemma 3.5.12, that for every natural number 
N £ N, there exists a unique function a at : {n £ N : n < N} — > N such 
that o/v(0) = c and ajv(n-H-) = f(n,a(n)) for all n £ N such that n < N.) 
For an additional challenge, prove this result without using any properties 
of the natural numbers other than the Peano axioms directly (in particular, 
without using the ordering of the natural numbers, and without appealing to 
Proposition 2.1.16). (Hint: first show inductively, using only the Peano axioms 
and basic set theory, that for every natural number N £ N, there exists a 
unique pair An, Bn of subsets of N which obeys the following properties: 
(a) An n Bn = 0, (b) An U Bn = N, (c) 0 £ An, (d) N++ £ Bn, (e) 
Whenever n £ B N , we have n++ £ B N ■ (f) Whenever n £ A N and n N, 
we have n++ £ An- Once one obtains these sets, use Hat as a substitute for 
{n £ N : n < N} in the previous argument.) 

Exercise 3.5.13. The purpose of this exercise is to show that there is essentially 
only one version of the natural number system in set theory (cf. the discussion 
in Remark 2.1.12). Suppose we have a set N' of “alternative natural numbers” , 
an “alternative zero” O', and an “alternative increment operation” which takes 
any alternative natural number v! £ N' and returns another alternative natural 
number n'++' £ N', such that the Peano axioms (Axioms 2. 1-2. 5) all hold 
with the natural numbers, zero, and increment replaced by their alternative 
counterparts. Show that there exists a bijection / : N — > N' from the natural 
numbers to the alternative natural numbers such that /( 0) = O', and such that 
for any n £ N and n' £ N', we have f(n) = n' if and only if f(n- H-) = n'- H-'. 
(Hint: use Exercise 3.5.12.) 


3.6 Cardinality of sets 

In the previous chapter we defined the natural numbers axiomatically, 
assuming that they were equipped with a 0 and an increment operation, 
and assuming five axioms on these numbers. Philosophically, this is quite 
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different from one of our main conceptualizations of natural numbers - 
that of cardinality , or measuring how many elements there are in a set. 
Indeed, the Peano axiom approach treats natural numbers more like 
ordinals than cardinals. (The cardinals are One, Two, Three, and 
are used to count how many things there are in a set. The ordinals are 
First, Second, Third, ..., and are used to order a sequence of objects. 
There is a subtle difference between the two, especially when comparing 
infinite cardinals with infinite ordinals, but this is beyond the scope 
of this text). We paid a lot of attention to what number came next 
after a given number n - which is an operation which is quite natural 
for ordinals, but less so for cardinals - but did not address the issue of 
whether these numbers could be used to count sets. The purpose of this 
section is to address this issue by noting that the natural numbers can 
be used to count the cardinality of sets, as long as the set is finite. 

The first thing is to work out when two sets have the same size: it 
seems clear that the sets {1,2,3} and {4,5,6} have the same size, but 
that both have a different size from {8, 9}. One way to define this is to 
say that two sets have the same size if they have the same number of 
elements, but we have not yet defined what the “number of elements” 
in a set is. Besides, this runs into problems when a set is infinite. 

The right way to define the concept of “two sets having the same 
size” is not immediately obvious, but can be worked out with some 
thought. One intuitive reason why the sets {1,2,3} and {4,5,6} have 
the same size is that one can match the elements of the first set with 
the elements in the second set in a one-to-one correspondence: 1 4, 

2 5, 3 6. (Indeed, this is how we first learn to count a set: we 

correspond the set we are trying to count with another set, such as a 
set of fingers on your hand). We will use this intuitive understanding as 
our rigorous basis for “having the same size” . 

Definition 3.6.1 (Equal cardinality). We say that two sets X and Y 
have equal cardinality iff there exists a bijection / : X — > Y from X 
to Y. 

Example 3.6.2. The sets {0, 1,2} and {3,4,5} have equal cardinality, 
since we can find a bijection between the two sets. Note that we do not 
yet know whether {0,1,2} and {3,4} have equal cardinality; we know 
that one of the functions / from {0, 1, 2} to {3, 4} is not a bijection, but 
we have not proven yet that there might still be some other bijection 
from one set to the other. (It turns out that they do not have equal 
cardinality, but we will prove this a little later). Note that this definition 



3.6. Cardinality of sets 


69 


makes sense regardless of whether X is finite or infinite (in fact, we 
haven’t even defined what finite means yet). 

Remark 3.6.3. The fact that two sets have equal cardinality does not 
preclude one of the sets from containing the other. For instance, if X 
is the set of natural numbers and Y is the set of even natural numbers, 
then the map / : X — >• Y defined by /(n) := 2n is a bijection from X 
to Y (why?), and so X and Y have equal cardinality, despite Y being a 
subset of X and seeming intuitively as if it should only have “half” of 
the elements of X. 

The notion of having equal cardinality is an equivalence relation: 

Proposition 3.6.4. Let X , Y, Z be sets. Then X has equal cardinality 
with X. If X has equal cardinality with Y, then Y has equal cardinality 
with X. If X has equal cardinality with Y and Y has equal cardinality 
with Z , then X has equal cardinality with Z . 

Proof. See Exercise 3.6.1. □ 

Let n be a natural number. Now we want to say when a set X has n 
elements. Certainly we want the set {i G N : 1 < i < n} = {1,2,. ..,ra} 
to have n elements. (This is true even when n = 0; the set {i € N : 1 < 
i < 0} is just the empty set.) Using our notion of equal cardinality, we 
thus define: 

Definition 3.6.5. Let n be a natural number. A set X is said to have 
cardinality n, iff it has equal cardinality with {i G N : 1 < i < n}. We 
also say that X has n elements iff it has cardinality n. 

Remark 3.6.6. One can use the set {i € N : i < n} instead of {i € N : 
1 < i < n}, since these two sets clearly have equal cardinality. (Why? 
What is the bijection?) 

Example 3.6.7. Let a,b,c,d be distinct objects. Then {a,b, c, d} has 
the same cardinality as {i € N : i < 4} = {0, 1, 2, 3} or {i € N : 1 < i < 
4} = {1,2, 3, 4} and thus has cardinality 4. Similarly, the set {a} has 
cardinality 1. 

There might be one problem with this definition: a set might have 
two different cardinalities. But this is not possible: 

Proposition 3.6.8 (Uniqueness of cardinality). Let X be a set with 
some cardinality n. Then X cannot have any other cardinality, i.e., X 
cannot have cardinality m for any m / n. 
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Before we prove this proposition, we need a lemma. 

Lemma 3.6.9. Suppose that n > 1, and X has cardinality n. Then X 
is non-empty, and if x is any element of X, then the set X — {x} (i.e., 
X with the element x removed ) has cardinality 4 n—l. 

Proof. If X is empty then it clearly cannot have the same cardinality as 
the non-empty set {i € N : 1 < i < n}, as there is no bijection from the 
empty set to a non-empty set (why?). Now let x be an element of X. 
Since X has the same cardinality as{i€N:l<i<lV}, we thus have 
a bijection / from X to {i € N : 1 < i < N}. In particular, /(x) is a 
natural number between 1 and n. Now define the function g : X — {x} 
to {i € N : 1 < i < n — 1} by the following rule: for any i/6l - {x}, 
we define g(y) := f{y) if f(y) < /(x), and define g(y) := f(y) - 1 if 
f(y) > /(x). (Note that f(y) cannot equal /(x) since y / x and / is a 
bijection.) It is easy to check that this map is also a bijection (why?), 
and so X — {x} has equal cardinality with {?' £ N : 1 < i < n — 1}. In 
particular X — {x} has cardinality n—l, as desired. □ 

Now we prove the proposition. 

Proof of Proposition 3.6.8. We induct on n. First suppose that n = 0. 
Then X must be empty, and so X cannot have any non-zero cardinality. 
Now suppose that the proposition is already proven for some n; we now 
prove it for n++. Let X have cardinality n++; and suppose that X also 
has some other cardinality m n++- By Lemma 3.6.9, X is non-empty, 
and if x is any element of X, then X — {x} has cardinality n and also 
has cardinality m — 1, by Lemma 3.6.9. By induction hypothesis, this 
means that n = m — 1, which implies that m = n-H-, a contradiction. 
This closes the induction. □ 

Thus, for instance, we now know, thanks to Propositions 3.6.4 and 
3.6.8, that the sets {0,1,2} and {3,4} do not have equal cardinality, 
since the first set has cardinality 3 and the second set has cardinality 2. 

Definition 3.6.10 (Finite sets). A set is finite iff it has cardinality n 
for some natural number n; otherwise, the set is called infinite. If X is 
a finite set, we use #(X) to denote the cardinality of X. 


4 Strictly speaking, n—l has not yet been defined in this text. For the purposes of 
this lemma, we define n — 1 to be the unique natural number m such that m- H- = n; 
this m is given by Lemma 2.2.10. 
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Example 3.6.11. The sets {0, 1, 2} and {3, 4} are finite, as is the empty 
set (0 is a natural number), and #({0,1,2}) = 3, #({3,4}) = 2, and 
#( 0 ) = 0 . 

Now we give an example of an infinite set. 

Theorem 3.6.12. The set of natural numbers N is infinite. 

Proof. Suppose for sake of contradiction that the set of natural numbers 
N was finite, so it had some cardinality #(N) = n. Then there is a 
bijection / from {i € N : 1 < i < n} to N. One can show that the 
sequence /(l), /(2), . . . , f(n) is bounded, or more precisely that there 
exists a natural number M such that f(i) < M for all 1 < i < n 
(Exercise 3.6.3). But then the natural number M + 1 is not equal to any 
of the f(i), contradicting the hypothesis that / is a bijection. □ 

Remark 3.6.13. One can also use similar arguments to show that any 
unbounded set is infinite; for instance the rationals Q and the reals R 
(which we will construct in later chapters) are infinite. However, it is 
possible for some sets to be “more” infinite than others; see Section 8.3. 

Now we relate cardinality with the arithmetic of natural numbers. 
Proposition 3.6.14 (Cardinal arithmetic). 

(a) Let X be a finite set, and let x be an object which is not an element 
of X . Then X U {x} is finite and #(X U {x}) = #(X) + 1. 

( b ) Let X and Y be finite sets. Then XL) Y is finite and #(X U 
Y) < #(X) + #(T). If in addition X and Y are disjoint (i.e., 
X O Y = 0 ), then #(X U Y) = #(X) + #(Y). 

(c) Let X be a finite set, and let Y be a subset of X. Then Y is finite, 
and #(Y) < #(X). If in addition Y # X (i.e., Y is a proper 
subset of X), then we have #(T) < #(X). 

( d ) If X is a finite set, and f : X — >• Y is a function, then f(X) is a 
finite set with #(/(X)) < #(X). If in addition f is one-to-one, 
then#(f(X)) = #(X). 

(e) Let X and Y be finite sets. Then Cartesian product X xY is finite 
and #(X xY) = ff(X) x #(T). 

(/) Let X and Y be finite sets. Then the set Y x ( defined in Axiom 
3.10) is finite and #{Y X ) = #(y)#W. 
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Proof. See Exercise 3.6.4. □ 

Remark 3.6.15. Proposition 3.6.14 suggests that there is another way 
to define the arithmetic operations of natural numbers; not defined re- 
cursively as in Definitions 2.2.1, 2.3.1, 2.3.11, but instead using the no- 
tions of union, Cartesian product, and power set. This is the basis 
of cardinal arithmetic , which is an alternative foundation to arithmetic 
than the Peano arithmetic we have developed here; we will not develop 
this arithmetic in this text, but we give some examples of how one would 
work with this arithmetic in Exercises 3.6.5, 3.6.6. 

This concludes our discussion of finite sets. We shall discuss infinite 
sets in Chapter 8, once we have constructed a few more examples of 
infinite sets (such as the integers, rationals and reals). 

— Exercises — 

Exercise 3.6.1. Prove Proposition 3.6.4. 

Exercise 3.6.2. Show that a set X has cardinality 0 if and only if X is the 
empty set. 

Exercise 3.6.3. Let n be a natural number, and let / : {7 € N : 1 < i < n} — > N 
be a function. Show that there exists a natural number M such that f(i) < M 
for all 1 < i < n. (Hint: induct on n. You may also want to peek at Lemma 
5.1.14.) Thus finite subsets of the natural numbers are bounded. 

Exercise 3.6.4. Prove Proposition 3.6.14. 

Exercise 3.6.5. Let A and B be sets. Show that Ax B and B x A have equal 
cardinality by constructing an explicit bijection between the two sets. Then 
use Proposition 3.6.14 to conclude an alternate proof of Lemma 2.3.2. 

Exercise 3.6.6. Let A, B , C be sets. Show that the sets ( A B ) C and A BxC 
have equal cardinality by constructing an explicit bijection between the two 
sets. Conclude that ( a b ) c = a bc for any natural numbers a, b , c. Use a similar 
argument to also conclude a b x a c = a b+c . 

Exercise 3.6.7. Let A and B be sets. Let us say that A has lesser or equal 
cardinality to B if there exists an injection / : A — ► B from A to B. Show that 
if A and B are finite sets, then A has lesser or equal cardinality to B if and 
only if #(A) < #(B). 

Exercise 3.6.8. Let A and B be sets such that there exists an injection f : 
A B from A to B (i.e., A has lesser or equal cardinality to B). Show that 
there then exists a surjection g : B — » A from B to A. (The converse to this 
statement requires the axiom of choice; see Exercise 8.4.3.) 

Exercise 3.6.9. Let A and B be finite sets. Show that AU B and Ad B are 
also finite sets, and that #(A) + #(B) = #(T U5) + #(A fl B). 
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Exercise 3.6.10. Let Ai, . . . , A n be finite sets such that #(Uj 6 {i „} ^i) > n - 

Show that there exists i £ {1, ... ,?i} such that #(Ai) > 2. (This is known as 
the pigeonhole principle .) 
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Integers and rationals 


4.1 The integers 

In Chapter 2 we built up most of the basic properties of the natural 
number system, but we have reached the limits of what one can do with 
just addition and multiplication. We would now like to introduce a new 
operation, that of subtraction, but to do that properly we will have to 
pass from the natural number system to a larger number system, that 
of the integers. 

Informally, the integers are what you can get by subtracting two 
natural numbers; for instance, 3 — 5 should be an integer, as should 
6 — 2. This is not a complete definition of the integers, because (a) it 
doesn’t say when two differences are equal (for instance we should know 
why 3 — 5 is equal to 2 — 4, but is not equal to 1 — 6), and (b) it doesn’t 
say how to do arithmetic on these differences (how does one add 3 — 5 
to 6 — 2?). Furthermore, (c) this definition is circular because it requires 
a notion of subtraction, which we can only adequately define once the 
integers are constructed. Fortunately, because of our prior experience 
with integers we know what the answers to these questions should be. 
To answer (a), we know from our advanced knowledge in algebra that 
a — b = c — d happens exactly when a + d = c + b, so we can characterize 
equality of differences using only the concept of addition. Similarly, to 
answer (b) we know from algebra that (a — b) + (c — d) = (a + c) — (6 + d ) 
and that ( a—b)(c—d ) = (ac+bd) — {ad+bc) . So we will take advantage of 
our foreknowledge by building all this into the definition of the integers, 
as we shall do shortly. 

We still have to resolve (c). To get around this problem we will use 
the following work-around: we will temporarily write integers not as a 
difference a — b, but instead use a new notation a — b to define integers, 
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where the — is a meaningless place-holder, similar to the comma in the 
Cartesian co-ordinate notation (x, y) for points in the plane. Later when 
we define subtraction we will see that a — b is in fact equal to a — b, and 
so we can discard the notation — ; it is only needed right now to avoid 
circularity. (These devices are similar to the scaffolding used to construct 
a building; they are temporarily essential to make sure the building is 
built correctly, but once the building is completed they are thrown away 
and never used again.) This may seem unnecessarily complicated in 
order to define something that we already are very familiar with, but we 
will use this device again to construct the rationals, and knowing these 
kinds of constructions will be very helpful in later chapters. 

Definition 4.1.1 (Integers). An integer is an expression 1 of the form 
a — b, where a and b are natural numbers. Two integers are considered 
to be equal, a — b = c — d, if and only if a + d = c + b. We let Z denote 
the set of all integers. 

Thus for instance 3 — 5 is an integer, and is equal to 2 — 4, because 
3 + 4 = 2 + 5. On the other hand, 3 — 5 is not equal to 2 — 3 because 
3+3 2 + 5. This notation is strange looking, and has a few deficiencies; 
for instance, 3 is not yet an integer, because it is not of the form a — bl 
We will rectify these problems later. 

We have to check that this is a legitimate notion of equality. We 
need to verify the reflexivity, symmetry, transitivity, and substitution 
axioms (see Section A. 7). We leave reflexivity and symmetry to Exercise 
4.1.1 and instead verify the transitivity axiom. Suppose we know that 
a — b = c — d and c — d = e — /. Then we have a + d = c + b and 
c+f = d+e. Adding the two equations together we obtain a+d+c+f = 
c + b + d + e. By Proposition 2.2.6 we can cancel the c and d, obtaining 
a + / = b + e, i.e., a — b = e — /. Thus the cancellation law was 
needed to make sure that our notion of equality is sound. As for the 
substitution axiom, we cannot verify it at this stage because we have not 
yet defined any operations on the integers. However, when we do define 


1 In the language of set theory, what we are doing here is starting with the space 
N x N of ordered pairs (a, b) of natural numbers. Then we place an equivalence 
relation ~ on these pairs by declaring (a, b ) ~ (c, d) iff a + d = c+b. The set-theoretic 
interpretation of the symbol a — b is that it is the space of all pairs equivalent to (a, b): 
a — b := {(c, d) £ N x N : (a, 6) ~ (c, d)}. However, this interpretation plays no role 
in how we manipulate the integers and we will not refer to it again. A similar set- 
theoretic interpretation can be given to the construction of the rational numbers later 
in this chapter, or the real numbers in the next chapter. 
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our basic operations on the integers, such as addition, multiplication, 
and order, we will have to verify the substitution axiom at that time in 
order to ensure that the definition is valid. (We will only need to do 
this for the basic operations; more advanced operations on the integers, 
such as exponentiation, will be defined in terms of the basic ones, and 
so we do not need to re-verify the substitution axiom for the advanced 
operations.) 

Now we define two basic arithmetic operations on integers: addition 
and multiplication. 

Definition 4.1.2. The sum of two integers, ( a — b ) + (c — d), is defined 
by the formula 

(a — b) + (c — d) := (a + c) — (b + d). 

The product of two integers, (a — b ) x (c — d), is defined by 
(a — b ) x (c — d) := (ac + bd) — (ad + be). 

Thus for instance, (3 — 5) + (1 — 4) is equal to (4 — 9). There is 
however one thing we have to check before we can accept these definitions 
- we have to check that if we replace one of the integers by an equal 
integer, that the sum or product does not change. For instance, (3 — 5) 
is equal to (2 — 4), so (3 — 5) + (1 — 4) ought to have the same value as 
(2 — 4) + (1 — 4), otherwise this would not give a consistent definition 
of addition. Fortunately, this is the case: 

Lemma 4.1.3 (Addition and multiplication are well-defined). Let 
a,b, a' ,b' , c, d be natural numbers. If (a — b) = ( a ’ — b'), then (a — b) + 
(c — d) = ( a ! — b') + (c — d) and (a — b ) x (c — d) = (a 1 — b') x (c — d), 
and also (c — d) + (a — b) = (c — d) + (a' — b') and (c — d) x (a — b ) = 
(c — d) x ( a ' — b'). Thus addition and midtiplication are well-defined 
operations ( equal inputs give equal outputs ). 

Proof. To prove that (a — b) + (c — d) = (a' — b') + (c — d), we evaluate 
both sides as (a + c) — (b + d) and ( a ' + c) — (b' + d). Thus we need to 
show that a + c + b' + d = a' + c + b + d. But since (a — b) = ( a ' — b '), we 
have a + b' = a' + b, and so by adding c + d to both sides we obtain the 
claim. Now we show that ( a — b) x (c — d) = ( a ' — b') x (c — d). Both 
sides evaluate to (ac + bd) — (ad + be) and (a'c + b'd) — (a'd + b'c), so 
we have to show that ac + bd + a'd + b'c = a'c + b'd + ad + be. But the 
left-hand side factors as c(a + b') + d(a' + b ), while the right factors as 
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c(a' + h) + d(a + b'). Since a + b' = a! + b, the two sides are equal. The 
other two identities are proven similarly. □ 

The integers n — 0 behave in the same way as the natural numbers 
n; indeed one can check that (n — 0) + ( m — 0) = (n + m) — 0 and 
(n — 0) x (m — 0) = nrn — 0. Furthermore, (n — 0) is equal to (m — 0) 
if and only if n = m. (The mathematical term for this is that there is an 
isomorphism between the natural numbers n and those integers of the 
form n — 0.) Thus we may identify the natural numbers with integers 
by setting n = n — 0; this does not affect our definitions of addition or 
multiplication or equality since they are consistent with each other. For 
instance the natural number 3 is now considered to be the same as the 
integer 3 — 0, thus 3 = 3 — 0. In particular 0 is equal to 0 — 0 and 1 is 
equal to 1 — 0. Of course, if we set n equal to n — 0, then it will also 
be equal to any other integer which is equal to n — 0, for instance 3 is 
equal not only to 3 — 0, but also to 4 — 1, 5 — 2, etc. 

We can now define incrementation on the integers by defining x++ : = 
x + 1 for any integer x ; this is of course consistent with our definition of 
the increment operation for natural numbers. However, this is no longer 
an important operation for us, as it has been now superceded by the 
more general notion of addition. 

Now we consider some other basic operations on the integers. 

Definition 4.1.4 (Negation of integers). If (a — b) is an integer, we 
define the negation —(a — b) to be the integer (6 — a). In particular 
if n = n — 0 is a positive natural number, we can define its negation 
— n = 0 — n. 

For instance —(3 — 5) = (5 — 3). One can check this definition is 
well-defined (Exercise 4.1.2). 

We can now show that the integers correspond exactly to what we 
expect. 

Lemma 4.1.5 (Trichotomy of integers). Let x be an integer. Then 
exactly one of the following three statements is true: (a) x is zero; ( b ) 
x is equal to a positive natural number n; or ( c ) x is the negation —n 
of a positive natural number n. 

Proof. We first show that at least one of (a), (b), (c) is true. By defi- 
nition, x = a — b for some natural numbers a, b. We have three cases: 
a > b, a = b, or a < b. If a > b then a = b + c for some positive natural 
number c, which means that a — b = c — 0 = c, which is (b). If a = 6, 
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then a — b = a — a = 0 — 0 = 0, which is (a). If a < b, then b > a, so 
that b — a = n for some natural number n by the previous reasoning, 
and thus a — b = —n, which is (c). 

Now we show that no more than one of (a), (b), (c) can hold at a 
time. By definition, a positive natural number is non-zero, so (a) and 
(b) cannot simultaneously be true. If (a) and (c) were simultaneously 
true, then 0 = — n for some positive natural n; thus (0 — 0) = (0 — n), 
so that 0 + n = 0 + 0, so that n = 0, a contradiction. If (b) and (c) 
were simultaneously true, then n = —m for some positive n, m, so that 
(n — 0) = (0 — m), so that n + m = 0 + 0, which contradicts Proposition 
2.2.8. Thus exactly one of (a), (b), (c) is true for any integer x. □ 

If n is a positive natural number, we call — n a negative integer. 
Thus every integer is positive, zero, or negative, but not more than one 
of these at a time. 

One could well ask why we don’t use Lemma 4.1.5 to define the in- 
tegers; i.e., why didn’t we just say an integer is anything which is either 
a positive natural number, zero, or the negative of a natural number. 
The reason is that if we did so, the rules for adding and multiplying 
integers would split into many different cases (e.g., negative times pos- 
itive equals positive; negative plus positive is either negative, positive, 
or zero, depending on which term is larger, etc.) and to verify all the 
properties would end up being much messier. 

We now summarize the algebraic properties of the integers. 

Proposition 4.1.6 (Laws of algebra for integers). Let x, y, z be integers. 
Then we have 


x + y 
(x + y) + z 
x + 0 = 0 + x 
x + (—x) = (— x) + x 
xy 
( xy)z 
xl = lx 
x(y + z) 
( y + z)x 


y + x 

x + (y + z) 
x 

0 

yx 

x(yz) 

x 

xy + xz 
yx + zx. 


Remark 4.1.7. The above set of nine identities have a name; they are 
asserting that the integers form a commutative ring. (If one deleted the 
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identity xy = yx, then they would only assert that the integers form a 
ring). Note that some of these identities were already proven for the 
natural numbers, but this does not automatically mean that they also 
hold for the integers because the integers are a larger set than the natural 
numbers. On the other hand, this proposition supercedes many of the 
propositions derived earlier for natural numbers. 

Proof. There are two ways to prove these identities. One is to use 
Lemma 4.1.5 and split into a lot of cases depending on whether x,y,z 
are zero, positive, or negative. This becomes very messy. A shorter 
way is to write x = (a — b ), y = (c — d ), and z = (e — /) for some 
natural numbers a, b, c, d, e, /, and expand these identities in terms of 
a, b , c, d, e, / and use the algebra of the natural numbers. This allows 
each identity to be proven in a few lines. We shall just prove the longest 
one, namely ( xy)z = x(yz): 

(xy)z = ((a — b)(c — d)) (e — /) 

= ((ac + bd ) — (ad + be)) (e — /) 

= ((ace + bde + adf + bef) — ( acf + bdf + ade + bee)) ; 
x(yz) = (a — b) ((c — d)(e — /)) 

= (a — b) ((ce + df) — ( cf + de)) 

= ((ace + adf + bef + bde) — (acf + ade + bee + bdf)) 

and so one can see that (xy)z and x(yz) are equal. The other identities 
are proven in a similar fashion; see Exercise 4.1.4. □ 

We now define the operation of subtraction x — y of two integers by 
the formula 

x-y :=x + (-y). 

We do not need to verify the substitution axiom for this operation, since 
we have defined subtraction in terms of two other operations on integers, 
namely addition and negation, and we have already verified that those 
operations are well-defined. 

One can easily check now that if a and b are natural numbers, then 

a — b = a - 1 — b = (a — 0) + (0 — b) = a — 6, 

and so a — b is just the same thing as a — b. Because of this we can now 
discard the — notation, and use the familiar operation of subtraction 
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instead. (As remarked before, we could not use subtraction immediately 
because it would be circular.) 

We can now generalize Lemma 2.3.3 and Corollary 2.3.7 from the 
natural numbers to the integers: 

Proposition 4.1.8 (Integers have no zero divisors). Let a and b be 
integers such that ab = 0. Then either a = 0 or b = 0 (or both). 

Proof. See Exercise 4.1.5. □ 

Corollary 4.1.9 (Cancellation law for integers). If a, b, c are integers 
such that ac = be and c is non- zero, then a = b. 

Proof. See Exercise 4.1.6. □ 

We now extend the notion of order, which was defined on the natural 
numbers, to the integers by repeating the definition verbatim: 

Definition 4.1.10 (Ordering of the integers). Let n and m be integers. 
We say that n is greater than or equal to m, and write n > m or m < n, 
iff we have n = m + a for some natural number a. We say that n is 
strictly greater than m, and write n > m or m < n, iff n > m and 
n / m. 

Thus for instance 5 > —3, because 5 = — 3 + 8 and 5 / —3. Clearly 
this definition is consistent with the notion of order on the natural num- 
bers, since we are using the same definition. 

Using the laws of algebra in Proposition 4.1.6 it is not hard to show 
the following properties of order: 

Lemma 4.1.11 (Properties of order). Let a,b,c be integers. 

(a) a > b if and only if a — b is a positive natural number. 

( b ) ( Addition preserves order ) If a > b, then a + c > b + c. 

(c) ( Positive multiplication preserves order ) If a > b and c is positive, 
then ac > be. 

(d) ( Negation reverses order) If a> b, then —a < —b. 

(e) (Order is transitive) If a > b and b > c, then a > c. 
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(/) ( Order trichotomy) Exactly one of the statements a > b, a < b, or 
a = b is true. 

Proof. See Exercise 4.1.7. □ 


— Exercises — 

Exercise 4.1.1. Verify that the definition of equality on the integers is both 
reflexive and symmetric. 

Exercise 4.1.2. Show that the definition of negation on the integers is well- 
defined in the sense that if (a — b) = (o' — b'), then —(a — b ) = — ( a ' — b ') (so 
equal integers have equal negations). 

Exercise 4.1.3. Show that (—1) x a = —a for every integer a. 

Exercise 4.1.4. Prove the remaining identities in Proposition 4.1.6. (Hint: one 
can save some work by using some identities to prove others. For instance, 
once you know that xy = yx, you get for free that xl = lx , and once you also 
prove x(y + z) = xy + xz, you automatically get (y + z)x = yx + zx for free.) 

Exercise 4.1.5. Prove Proposition 4.1.8. (Hint: while this proposition is not 
quite the same as Lemma 2.3.3, it is certainly legitimate to use Lemma 2.3.3 
in the course of proving Proposition 4.1.8.) 

Exercise 4.1.6. Prove Corollary 4.1.9. (Hint: there are two ways to do this. 
One is to use Proposition 4.1.8 to conclude that a — b must be zero. Another 
way is to combine Corollary 2.3.7 with Lemma 4.1.5.) 

Exercise 4.1.7. Prove Lemma 4.1.11. (Hint: use the first part of this lemma to 
prove all the others.) 

Exercise 4.1.8. Show that the principle of induction (Axiom 2.5) does not 
apply directly to the integers. More precisely, give an example of a property 
P(n) pertaining to an integer n such that P(0) is true, and that P(n) implies 
P(n-H-) for all integers n, but that P(n) is not true for all integers n. Thus 
induction is not as useful a tool for dealing with the integers as it is with the 
natural numbers. (The situation becomes even worse with the rational and real 
numbers, which we shall define shortly.) 


4.2 The rationals 

We have now constructed the integers, with the operations of addition, 
subtraction, multiplication, and order and verified all the expected alge- 
braic and order-theoretic properties. Now we will use a similar construc- 
tion to build the rationals, adding division to our mix of operations. 

Just like the integers were constructed by subtracting two natural 
numbers, the rationals can be constructed by dividing two integers, 
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though of course we have to make the usual caveat that the denomi- 
nator should be non-zero 2 . Of course, just as two differences a — b and 
c — d can be equal if a + d = c + b, we know (from more advanced knowl- 
edge) that two quotients a/b and c/d can be equal if ad = be. Thus, 
in analogy with the integers, we create a new meaningless symbol // 
(which will eventually be superceded by division), and define 

Definition 4.2.1. A rational number is an expression of the form a/ /b, 
where a and b are integers and b is non-zero; a// 0 is not considered to 
be a rational number. Two rational numbers are considered to be equal, 
a/ /b = c//d, if and only if ad = cb. The set of all rational numbers is 
denoted Q. 

Thus for instance 3//4 = 6//8 = —3/ / — 4, but 3//4 4/ /3. This 
is a valid definition of equality (Exercise 4.2.1). Now we need a notion 
of addition, multiplication, and negation. Again, we will take advantage 
of our pre-existing knowledge, which tells us that a/b + c/d should equal 
(ad+bc) / (bd) and that a/b*c/d should equal ac/bd, while —(a/b) equals 
(—a)/b. Motivated by this foreknowledge, we define 

Definition 4.2.2. If a/ /b and c/ /d are rational numbers, we define their 
sum 

( a//b ) + ( c//d ) := (ad + be) / / (bd) 

their product 

(a//b) * (c//d) := (ac)//(bd) 

and the negation 

-(a//b) := (—a)//b. 

Note that if b and d are non-zero, then bd is also non-zero, by Propo- 
sition 4.1.8, so the sum or product of two rational numbers remains a 
rational number. 

Lemma 4.2.3. The sum, product, and negation operations on rational 
numbers are well-defined, in the sense that if one replaces a/ /b with 
another rational number a'//b' which is equal to a/ /b, then the output 
of the above operations remains unchanged, and similarly for c/ /d. 

2 There is no reasonable way we can divide by zero, since one cannot have both 
the identities (a/b) *b = a and c* 0 = 0 hold simultaneously if b is allowed to be zero. 
However, we can eventually get a reasonable notion of dividing by a quantity which 
approaches zero - think of L’Hopital’s rule (see Section 10.5), which suffices for doing 
things like defining differentiation. 
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Proof. We just verify this for addition; we leave the remaining claims 
to Exercise 4.2.2. Suppose a//b = a' / /b so that b and b' are non-zero 
and ab' = a'b. We now show that a//b + c/ /d = a' / /b' + c//d. By 
definition, the left-hand side is (ad + be)/ /bd and the right-hand side is 
(a'd + b'c)/ /b'd, so we have to show that 

(ad + bc)b'd = (a'd + b'c)bd, 

which expands to 

ab' d 2 + bb'cd = a’bd 2 + bb'cd. 

But since ab' = a'b, the claim follows. Similarly if one replaces c//d by 

d / /d! . □ 

We note that the rational numbers a//l behave in a manner identical 
to the integers a: 


(a//l) + (b//l) = (a + b)//l- 
(a/ /l) x (b// 1) = (ab// 1); 

-(«//D = {-a)l/l. 

Also, a/ / 1 and 6//1 are only equal when a and b are equal. Because of 
this, we will identify a with a/ /I for each integer a: a = a//l; the above 
identities then guarantee that the arithmetic of the integers is consis- 
tent with the arithmetic of the rationals. Thus just as we embedded 
the natural numbers inside the integers, we embed the integers inside 
the rational numbers. In particular, all natural numbers are rational 
numbers, for instance 0 is equal to 0//1 and 1 is equal to 1//1. 

Observe that a rational number a/ /bis equal to 0 = 0//1 if and only 
if a x 1 = b x 0, i.e., if the numerator a is equal to 0. Thus if a and b 
are non-zero then so is a/ /b. 

We now define a new operation on the rationals: reciprocal. If x = 
a/ /b is a non-zero rational (so that a, b / 0) then we define the reciprocal 
x~ 1 of x to be the rational number x~ l := b/ /a. It is easy to check that 
this operation is consistent with our notion of equality: if two rational 
numbers a/ /b, a! / /b' are equal, then their reciprocals are also equal. 
(In contrast, an operation such as “numerator” is not well-defined: the 
rationals 3//4 and 6//8 are equal, but have unequal numerators, so we 
have to be careful when referring to such terms as “the numerator of 
x” .) We however leave the reciprocal of 0 undefined. 
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We now summarize the algebraic properties of the rationals. 

Proposition 4.2.4 (Laws of algebra for rationals). Let x,y,z be ratio- 
nals. Then the following laws of algebra hold: 


x + y 
(x + y) + z 
x + 0 = 0 + x 
x + (— x) = (— x) + x 
xy 
( xy)z 
xl = lx 
x(y + z ) 
(y + z)x 


y + x 

x + (y + z) 
x 

0 

yx 

x(yz ) 

X 

xy + xz 
yx + zx. 


If x is non-zero, we also have 

xx _1 = x ~ 1 x = 1. 


Remark 4.2.5. The above set of ten identities have a name; they are 
asserting that the rationals Q form a field. This is better than being a 
commutative ring because of the tenth identity si -1 = x -1 x = 1. Note 
that this proposition supercedes Proposition 4.1.6. 

Proof. To prove this identity, one writes x = a/ /b, y = c//d, z = e/ / / 
for some integers a, c, e and non-zero integers b , d, /, and verifies each 
identity in turn using the algebra of the integers. We shall just prove 
the longest one, namely (x + y) + z = x + (y + z): 

(x + y) + z = (( a//b ) + (c//d)) + (e//f) 

= (( ad + be)/ /bd) + (e//f) 

= ( adf + bef + bde)/ / bdf ; 
x + {y + z) = ( a//b ) + ((c//d) + (e///)) 

= (a//b) + (( cf + de)//df) 

= ( adf + bef + bde)/ /bdf 


and so one can see that (x + y) + z and x + (y + z) are equal. The 
other identities are proven in a similar fashion and are left to Exercise 
4.2.3. □ 
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We can now define the quotient x/y of two rational numbers x and 
y, provided that y is non-zero, by the formula 

x/y := x X y~ l . 


Thus, for instance 

(3//4)/(5//6) = (3//4) x (6//5) = (18//20) = (9//10). 

Using this formula, it is easy to see that a/b = a/ /b for every integer a 
and every non-zero integer b. Thus we can now discard the / / notation, 
and use the more customary a/b instead of a/ /b. 

In a similar spirit, we define subtraction on the rationals by the 
formula 

x-y :=x + (-y), 
just as we did with the integers. 

Proposition 4.2.4 allows us to use all the normal rules of algebra; we 
will now proceed to do so without further comment. 

In the previous section we organized the integers into positive, zero, 
and negative numbers. We now do the same for the rationals. 

Definition 4.2.6. A rational number x is said to be positive iff we have 
x = a/b for some positive integers a and b. It is said to be negative iff 
we have x = — y for some positive rational y (i.e., x = {—a)/b for some 
positive integers a and b). 

Thus for instance, every positive integer is a positive rational num- 
ber, and every negative integer is a negative rational number, so our 
new definition is consistent with our old one. 

Lemma 4.2.7 (Trichotomy of rationals). Let x be a rational number. 
Then exactly one of the following three statements is true: (a) x is equal 
to 0. (b) x is a positive rational number, (c) x is a negative rational 
number. 

Proof. See Exercise 4.2.4. □ 

Definition 4.2.8 (Ordering of the rationals). Let x and y be rational 
numbers. We say that x > y iff x — y is a positive rational number, and 
x < y iff x — y is a negative rational number. We write x > y iff either 
x > y or x = y, and similarly define x < y. 
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Proposition 4.2.9 (Basic properties of order on the rationals). Let 
x, y, z be rational numbers. Then the following properties hold. 

(a) (Order trichotomy) Exactly one of the three statements x = y, 
x < y, or x > y is true. 

( b ) (Order is anti-symmetric) One has x < y if and only if y > x. 

(c) (Order is transitive) If x < y and y < z, then x < z. 

( d ) (Addition preserves order) If x < y, then x + z < y + z. 

(e) (Positive multiplication preserves order) If x < y and z is positive, 
then xz < yz. 

Proof. See Exercise 4.2.5. □ 

Remark 4.2.10. The above five properties in Proposition 4.2.9, com- 
bined with the field axioms in Proposition 4.2.4, have a name: they 
assert that the rationals Q form an ordered field. It is important to 
keep in mind that Proposition 4.2.9(e) only works when z is positive, 
see Exercise 4.2.6. 


— Exercises — 

Exercise 4.2.1. Show that the definition of equality for the rational numbers 
is reflexive, symmetric, and transitive. (Hint: for transitivity, use Corollary 
4.1.9.) 

Exercise 4.2.2. Prove the remaining components of Lemma 4.2.3. 

Exercise 4.2.3. Prove the remaining components of Proposition 4.2.4. (Hint: 
as with Proposition 4.1.6, you can save some work by using some identities to 
prove others.) 

Exercise 4.2.4. Prove Lemma 4.2.7. (Note that, as in Proposition 2.2.13, you 
have to prove two different things: firstly, that at least one of (a), (b), (c) is 
true; and secondly, that at most one of (a), (b), (c) is true.) 

Exercise 4.2.5. Prove Proposition 4.2.9. 

Exercise 4.2.6. Show that if x, y, z are rational numbers such that x < y and 
z is negative , then xz > yz. 

4.3 Absolute value and exponentiation 

We have already introduced the four basic arithmetic operations of ad- 
dition, subtraction, multiplication, and division on the rationals. (Re- 
call that subtraction and division came from the more primitive no- 
tions of negation and reciprocal by the formulae x — y := x + (— y) and 
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x/y := x x y _1 .) We also have a notion of order <, and have organized 
the rationals into the positive rationals, the negative rationals, and zero. 
In short, we have shown that the rationals Q form an ordered field. 

One can now use these basic operations to construct more opera- 
tions. There are many such operations we can construct, but we shall 
just introduce two particularly useful ones: absolute value and exponen- 
tiation. 

Definition 4.3.1 (Absolute value). If x is a rational number, the abso- 
lute value \x\ of x is defined as follows. If x is positive, then |x| := x. If 
x is negative, then \x\ := —x. If x is zero, then |.x| := 0. 

Definition 4.3.2 (Distance). Let x and y be rational numbers. The 
quantity \x — y\ is called the distance between x and y and is sometimes 
denoted d(x,y), thus d(x,y ) := \x — y |. For instance, d( 3,5) = 2. 

Proposition 4.3.3 (Basic properties of absolute value and distance). 
Let x, y, z be rational numbers. 

(a) (Non- degeneracy of absolute value ) We have |.x| > 0. Also, |x| = 0 
if and only if x is 0. 

( b ) ( Triangle inequality for absolute value) We have \x + y\ < |x| + |y|. 

(c) We have the inequalities —y < x < y if and only if y > |x|. In 
particular, we have — |x| < x < |x|. 

(d) ( Multiplicativity of absolute value ) We have \xy\ = |.x| \y\. In 
particular, \ — x\ = |a:|. 

(e) (Non- degeneracy of distance) We have d(x,y) > 0. Also, d(x,y) = 
0 if and only if x = y. 

(/) (Symmetry of distance) d(x,y) = d(y,x). 

(g) ( Triangle inequality for distance) d(x, z) < d(x, y) + d(y, z). 

Proof. See Exercise 4.3.1. □ 

Absolute value is useful for measuring how “close” two numbers are. 
Let us make a somewhat artificial definition: 

Definition 4.3.4 (e-closeness). Let e > 0 be a rational number, and 
let x, y be rational numbers. We say that y is e-close to x iff we have 
d(y,x) < e. 



4- Integers and rationals 


Remark 4.3.5. This definition is not standard in mathematics text- 
books; we will use it as “scaffolding” to construct the more important 
notions of limits (and of Cauchy sequences) later on, and once we have 
those more advanced notions we will discard the notion of e-close. 

Examples 4.3.6. The numbers 0.99 and 1.01 are 0.1-close, but they 
are not 0.01 close, because d(0.99, 1.01) = |0.99 — 1.01 1 = 0.02 is larger 
than 0.01. The numbers 2 and 2 are e-close for every positive e. 

We do not bother defining a notion of e-close when e is zero or 
negative, because if e is zero then x and y are only e-close when they are 
equal, and when e is negative then x and y are never e-close. (In any 
event it is a long-standing tradition in analysis that the Greek letters e, 
5 should only denote small positive numbers.) 

Some basic properties of e-closeness are the following. 

Proposition 4.3.7. Let x,y, z,w be rational numbers. 

(a) If x = y, then x is e-close to y for every e > 0. Conversely, if x 
is e-close to y for every e > 0, then we have x = y. 

( b ) Let e > 0. If x is e-close to y, then y is e-close to x. 

(c) Let e, 5 > 0. If x is e-close to y, and y is 5-close to z, then x and 
z are (e + 5) -close. 

( d ) Let e, 5 > 0. If x and y are e-close, and z and w are 5-close, then 
x + z and y + w are (e + 5) -close, and x — z and y — w are also 
(e + 5)-close. 

(e) Let e > 0. If x and y are e-close, they are also e' -close for every 
e' > e. 

(/) Let e > 0. If y and z are both e-close to x, and w is between y and 
z [i.e., y < w < z or z < w < y) , then w is also e-close to x. 

( g ) Let e > 0. If x and y are e-close, and z is non-zero, then xz and 
yz are e\z\- close. 

(h) Let e, 5 > 0. If x and y are e-close, and z and w are 5-close, then 
xz and yiv are {e\z\ + 5|x| + e5)-close. 

Proof. We only prove the most difficult one, (h); we leave (a)-(g) to 
Exercise 4.3.2. Let e, 5 > 0, and suppose that x and y are e-close. If we 



4-3. Absolute value and exponentiation 


89 


write a := y — x, then we have y = x + a and that |a| < e. Similarly, if z 
and w are (5-close, and we define b := vj — z, then w = z + b and |6| < <5. 
Since y = x + a and w = z + b, we have 

yw = (x + a) (z + b) = xz + az + xb + ab. 


Thus 

| yw — xz | = | az + bx + ab\ < \az\ + \bx\ + \ab\ = |o||z| + |6||x| + |a||6|. 
Since |a| < e and |6| < <5, we thus have 

| yu> — xz | < e\z\ + <5|x| + e< 5 

and thus that yu; and .xz are (e| 2 :| + <5|x| + e<5)-close. □ 

Remark 4.3.8. One should compare statements (a)-(c) of this propo- 
sition with the reflexive, symmetric, and transitive axioms of equality. 
It is often useful to think of the notion of “e-close” as an approximate 
substitute for that of equality in analysis. 

Now we recursively define exponentiation for natural number expo- 
nents, extending the previous definition in Definition 2.3.11. 

Definition 4.3.9 (Exponentiation to a natural number). Let x be a 
rational number. To raise x to the power 0, we define x° := 1; in 
particular we define 0° := 1. Now suppose inductively that x n has been 
defined for some natural number n, then we define x n+l := x n x x. 

Proposition 4.3.10 (Properties of exponentiation, I). Let x,y be ra- 
tional numbers, and let n, m be natural numbers. 

(a) We have x n x m = x n+m , (. x n ) m = x nm , and (xy) n = x n y n . 

( b ) Suppose n > 0. Then we have x n = 0 if and only if x = 0. 

(c) If x > y > 0, then x n > y n > 0. If x > y > 0 and n > 0, then 
x n >y n > 0. 

(d) We have |x n | = \x\ n . 

Proof. See Exercise 4.3.3. □ 


Now we define exponentiation for negative integer exponents. 
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Definition 4.3.11 (Exponentiation to a negative number). Let x be a 
non-zero rational number. Then for any negative integer — n, we define 
x~ n := l/x n . 

Thus for instance x~ A = 1/x 3 = 1 / (x x x x x) . We now have x n 
defined for any integer n, whether n is positive, negative, or zero. Ex- 
ponentiation with integer exponents has the following properties (which 
supercede Proposition 4.3.10): 

Proposition 4.3.12 (Properties of exponentiation, II). Letx,y be non- 
zero rational numbers, and let n , m be integers. 


(a) 

We 

have x n x 

m ™n+m 

— x j 

(x n ) m -- 

_ „nm 
- , 

and ( xy) n = x n y n . 

(b) 

If x 

IV 
as 

V 

o 

then x n > 

o 

A 

£ 

if n is 

positive, and 0 < x 


if n 

is negative. 




(c) 

If x, 

,y > 0, n 

/ 0, and x 

£ " 

II 

£ 

then x 

= y ■ 

(d) 

We 

have \x n \ 

= \x\ n . 





Proof. See Exercise 4.3.4. □ 


— Exercises — 

Exercise 4.3.1. Prove Proposition 4.3.3. (Hint: while all of these claims can 
be proven by dividing into cases, such as when x is positive, negative, or zero, 
several parts of the proposition can be proven without such a tedious division 
into cases. For instance one can use earlier parts of the proposition to prove 
later ones.) 

Exercise 4.3.2. Prove the remaining claims in Proposition 4.3.7. 

Exercise 4.3.3. Prove Proposition 4.3.10. (Hint: use induction.) 

Exercise 4.3.4. Prove Proposition 4.3.12. (Hint: induction is not suitable here. 
Instead, use Proposition 4.3.10.) 

Exercise 4.3.5. Prove that 2 N > N for all positive integers N. (Hint: use 
induction.) 

4.4 Gaps in the rational numbers 

Imagine that we arrange the rationals on a line, arranging x to the right 
of y if x > y. (This is a non-rigorous arrangement, since we have not 
yet defined the concept of a line, but this discussion is only intended 
to motivate the more rigorous propositions below.) Inside the rationals 
we have the integers, which are thus also arranged on the line. Now we 
work out how the rationals are arranged with respect to the integers. 
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Proposition 4.4.1 (Interspersing of integers by rationals). Let x be a 
rational number. Then there exists an integer n such that n < x < n + l. 
In fact, this integer is unique ( i.e ., for each x there is only one n for 
which n < x < n + 1). In particular, there exists a natural number N 
such that N > x (i.e., there is no such thing as a rational number which 
is larger than all the natural numbers ) . 

Remark 4.4.2. The integer n for which n < x < n + lis sometimes 
referred to as the integer part of x and is sometimes denoted n = [.xj . 

Proof. See Exercise 4.4.1. □ 

Also, between every two rational numbers there is at least one addi- 
tional rational: 

Proposition 4.4.3 (Interspersing of rationals by rationals). If x and 
y are two rationals such that x < y, then there exists a third rational z 
such that x < z < y. 

Proof. We set z := (x + y)/ 2. Since x < y, and 1/2 = 1//2 is positive, 
we have from Proposition 4.2.9 that x/2 < y/2. If we add y/2 to both 
sides using Proposition 4.2.9 we obtain x/2+y/2 < y/2+y/2, i.e., z < y. 
If we instead add x/2 to both sides we obtain x/2 + x/2 < y/2 + x/2, 
i.e., x < z. Thus x < z < y as desired. □ 

Despite the rationals having this denseness property, they are still 
incomplete; there are still an infinite number of “gaps” or “holes” be- 
tween the rationals, although this denseness property does ensure that 
these holes are in some sense infinitely small. For instance, we will now 
show that the rational numbers do not contain any square root of two. 

Proposition 4.4.4. There does not exist any rational number x for 
which x 2 = 2. 

Proof. We only give a sketch of a proof; the gaps will be filled in Exercise 
4.4.3. Suppose for sake of contradiction that we had a rational number 
x for which x 2 = 2. Clearly x is not zero. We may assume that x is 
positive, for if x were negative then we could just replace x by — x (since 
x 2 = (—x) 2 ). Thus x = p/q for some positive integers p, q, so ( p/q ) 2 = 2, 
which we can rearrange as p 2 = 2 q 2 . Define a natural number p to be 
even if p = 2k for some natural number k, and odd if p = 2k + 1 for some 
natural number k. Every natural number is either even or odd, but not 
both (why?). If p is odd, then p 2 is also odd (why?), which contradicts 
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p 2 = 2 q 2 . Thus p is even, i.e., p = 2k for some natural number k. Since 
p is positive, k must also be positive. Inserting p = 2k into p 2 = 2 q 2 we 
obtain Ak 2 = 2 q 2 , so that q 2 = 2k 2 . 

To summarize, we started with a pair (p, q ) of positive integers such 
that p 2 = 2 q 2 , and ended up with a pair ( q , k) of positive integers such 
that q 2 = 2k 2 . Since p 2 = 2 q 2 , we have q < p (why?). If we rewrite 
p' := q and q' := k , we thus can pass from one solution (p,q) to the 
equation p 2 = 2 q 2 to a new solution (p' ,q') to the same equation which 
has a smaller value of p. But then we can repeat this procedure again 
and again, obtaining a sequence ( p",q "), (p'",q'"), etc. of solutions to 
p 2 = 2 q 2 , each one with a smaller value of p than the previous, and each 
one consisting of positive integers. But this contradicts the principle of 
infinite descent (see Exercise 4.4.2). This contradiction shows that we 
could not have had a rational x for which x 2 = 2. □ 

On the other hand, we can get rational numbers which are arbitrarily 
close to a square root of 2: 

Proposition 4.4.5. For every rational number e > 0, there exists a 
non-negative rational number x such that x 2 < 2 < (x + s) 2 . 

Proof. Let s > 0 be rational. Suppose for sake of contradiction that 
there is no non- negative rational number x for which x 2 < 2 < (x + e) 2 . 
This means that whenever x is non- negative and x 2 < 2, we must also 
have (x + s) 2 < 2 (note that (x + s) 2 cannot equal 2, by Proposition 
4.4.4). Since 0 2 < 2, we thus have e 2 < 2, which then implies (2e) 2 < 2, 
and indeed a simple induction shows that (ns) 2 < 2 for every natural 
number n. (Note that ns is non-negative for every natural number n 
- why?) But, by Proposition 4.4.1 we can find an integer n such that 
n > 2/s, which implies that ns > 2, which implies that (ns) 2 > 4 > 2, 
contradicting the claim that (ns) 2 < 2 for all natural numbers n. This 
contradiction gives the proof. □ 

Example 4.4.6. If 3 s = 0.001, we can take x = 1.414, since x 2 = 
1.999396 and (x + s) 2 = 2.002225. 

Proposition 4.4.5 indicates that, while the set Q of rationals does 
not actually have \f2 as a member, we can get as close as we wish to 


3 We will use the decimal system for defining terminating decimals, for instance 
1.414 is defined to equal the rational number 1414/1000. We defer the formal discus- 
sion on the decimal system to an Appendix (§B). 
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y/2. For instance, the sequence of rationals 

1.4, 1.41, 1.414, 1.4142, 1.41421, . . . 
seem to get closer and closer to \/2, as their squares indicate: 

1.96, 1.9881, 1.99396, 1.99996164, 1.9999899241, . . . 

Thus it seems that we can create a square root of 2 by taking a “limit” of 
a sequence of rationals. This is how we shall construct the real numbers 
in the next chapter. (There is another way to do so, using something 
called “Dedekind cuts”, which we will not pursue here. One can also 
proceed using infinite decimal expansions, but there are some sticky 
issues when doing so, e.g., one has to make 0.999 . . . equal to 1.000 . . ., 
and this approach, despite being the most familiar, is actually more 
complicated than other approaches; see the Appendix §B.) 

— Exercises — 

Exercise 4.4.1. Prove Proposition 4.4.1. (Hint: use Proposition 2.3.9.) 

Exercise 4.4.2. A definition: a sequence cto, aq, < 22 , . . . of numbers (natural num- 
bers, integers, rationals, or reals) is said to be in infinite descent if we have 
a n > a n+ i for all natural numbers n (i.e. , ao > ai > <22 > . . .). 

(a) Prove the principle of infinite descent : that it is not possible to have a 
sequence of natural numbers which is in infinite descent. (Hint: assume 
for sake of contradiction that you can find a sequence of natural numbers 
which is in infinite descent. Since all the a„ are natural numbers, you 
know that a n > 0 for all n. Now use induction to show in fact that 
a„ > k for all k £ N and all n £ N, and obtain a contradiction.) 

(b) Does the principle of infinite descent work if the sequence 01 , < 22 , < 23 , . . . 
is allowed to take integer values instead of natural number values? What 
about if it is allowed to take positive rational values instead of natural 
numbers? Explain. 

Exercise 4.4.3. Fill in the gaps marked (why?) in the proof of Proposition 
4.4.4. 
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To review our progress to date, we have rigorously constructed three fun- 
damental number systems: the natural number system N, the integers 
Z, and the rationals Q 1 . We defined the natural numbers using the five 
Peano axioms, and postulated that such a number system existed; this 
is plausible, since the natural numbers correspond to the very intuitive 
and fundamental notion of sequential counting. Using that number sys- 
tem one could then recursively define addition and multiplication, and 
verify that they obeyed the usual laws of algebra. We then constructed 
the integers by taking formal 2 differences of the natural numbers, a — b. 
We then constructed the rationals by taking formal quotients of the 
integers, a//b, although we need to exclude division by zero in order 
to keep the laws of algebra reasonable. (You are of course free to de- 
sign your own number system, possibly including one where division 
by zero is permitted; but you will have to give up one or more of the 
field axioms from Proposition 4.2.4, among other things, and you will 
probably get a less useful number system in which to do any real-world 
problems.) 

The rational system is already sufficient to do a lot of mathematics - 
much of high school algebra, for instance, works just fine if one only 


J The symbols N, Q, and R stand for “natural”, “quotient”, and “real” respec- 
tively. Z stands for “Zahlen” , the German word for number. There is also the complex 
numbers C, which obviously stands for “complex”. 

2 Formal means “having the form of”; at the beginning of our construction the 
expression a — b did not actually mean the difference a — b, since the symbol — was 
meaningless. It only had the form of a difference. Later on we defined subtraction 
and verified that the formal difference was equal to the actual difference, so this 
eventually became a non-issue, and our symbol for formal differencing was discarded. 
Somewhat confusingly, this use of the term “formal” is unrelated to the notions of a 
formal argument and an informal argument. 
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knows about the rationals. However, there is a fundamental area of 
mathematics where the rational number system does not suffice - that of 
geometry (the study of lengths, areas, etc.). For instance, a right-angled 
triangle with both sides equal to 1 gives a hypotenuse of \/2, which 
is an irrational number, i.e., not a rational number; see Proposition 
4.4.4. Things get even worse when one starts to deal with the sub-field 
of geometry known as trigonometry, when one sees numbers such as n 
or cos(l), which turn out to be in some sense “even more” irrational 
than \/2. (These numbers are known as transcendental numbers , but to 
discuss this further would be far beyond the scope of this text.) Thus, in 
order to have a number system which can adequately describe geometry 
- or even something as simple as measuring lengths on a line - one needs 
to replace the rational number system with the real number system. 
Since differential and integral calculus is also intimately tied up with 
geometry - think of slopes of tangents, or areas under a curve - calculus 
also requires the real number system in order to function properly. 

However, a rigorous way to construct the reals from the rationals 
turns out to be somewhat difficult - requiring a bit more machinery 
than what was needed to pass from the naturals to the integers, or the 
integers to the rationals. In those two constructions, the task was to 
introduce one more algebraic operation to the number system - e.g., one 
can get integers from naturals by introducing subtraction, and get the 
rationals from the integers by introducing division. But to get the reals 
from the rationals is to pass from a “discrete” system to a “continuous” 
one, and requires the introduction of a somewhat different notion - that 
of a limit. The limit is a concept which on one level is quite intuitive, but 
to pin down rigorously turns out to be quite difficult. (Even such great 
mathematicians as Euler and Newton had difficulty with this concept. It 
was only in the nineteenth century that mathematicians such as Cauchy 
and Dedekind figured out how to deal with limits rigorously.) 

In Section 4.4 we explored the “gaps” in the rational numbers; now 
we shall £11 in these gaps using limits to create the real numbers. The 
real number system will end up being a lot like the rational numbers, but 
will have some new operations - notably that of supremum, which can 
then be used to define limits and thence to everything else that calculus 
needs. 

The procedure we give here of obtaining the real numbers as the 
limit of sequences of rational numbers may seem rather complicated. 
However, it is in fact an instance of a very general and useful procedure, 
that of completing one metric space to form another; see Exercise 11.6.8. 
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5.1 Cauchy sequences 

Our construction of the real numbers shall rely on the concept of a 
Cauchy sequence. Before we define this notion formally, let us first 
define the concept of a sequence. 

Definition 5.1.1 (Sequences). Let m be an integer. A sequence 
{a n )^ =m of rational numbers is any function from the set {n € Z : n > 
to} to Q, i.e., a mapping which assigns to each integer n greater than or 
equal to m, a rational number a n . More informally, a sequence (a n )^ =m 
of rational numbers is a collection of rationals a m , a m+ 1, a m+ 2 , .... 

Example 5.1.2. The sequence (n 2 )^h 0 is the collection 0, 1, 4, 9,... 
of natural numbers; the sequence (3)))Tg is the collection 3, 3, 3, . . . of 
natural numbers. These sequences are indexed starting from 0, but we 
can of course make sequences starting from 1 or any other number; for 
instance, the sequence (a n )™ =3 denotes the sequence 03,04,05 ,..., so 
(n 2 )£L 3 is the collection 9, 16, 25, . . . of natural numbers. 

We want to define the real numbers as the limits of sequences of 
rational numbers. To do so, we have to distinguish which sequences 
of rationals are convergent and which ones are not. For instance, the 
sequence 

1.4, 1.41, 1.414, 1.4142, 1.41421, . . . 
looks like it is trying to converge to something, as does 

0.1,0.01,0.001,0.0001 ,... 
while other sequences such as 

1.2.4.8.16.. .. 
or 

1 . 0 . 1 . 0 . 1 .... 

do not. To do this we use the definition of e-closeness defined earlier. 
Recall from Definition 4.3.4 that two rational numbers x, y are e-close 
if d(x, y) = \x — y\ < e. 

Definition 5.1.3 (e-steadiness). Let e > 0. A sequence (a n )£T 0 is said 
to be e-steady iff each pair aj , ak of sequence elements is e-close for 
every natural number j, k. In other words, the sequence ao, ai, 02, ■ ■ • is 
e-steady iff d(aj, ak) < e for all j. k. 
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Remark 5.1.4. This definition is not standard in the literature; we will 
not need it outside of this section; similarly for the concept of “eventual 
e-steadiness” below. We have defined e-steadiness for sequences whose 
index starts at 0, but clearly we can make a similar notion for sequences 
whose indices start from any other number: a sequence a at, ajy+i , ... is 
e-steady if one has d(aj, rif ; ) < e for all j, k > N. 

Example 5.1.5. The sequence 1,0, 1,0, 1,... is 1-steady, but is not 1/2- 
steady. The sequence 0.1, 0.01, 0.001, 0.0001, ... is 0.1-steady, but is not 
0.01-steady (why?). The sequence 1, 2, 4, 8, 16, . . . is not e-steady for 
any e (why?). The sequence 2, 2, 2, 2, . . . is e-steady for every e > 0. 

The notion of e-steadiness of a sequence is simple, but does not really 
capture the limiting behavior of a sequence, because it is too sensitive 
to the initial members of the sequence. For instance, the sequence 

10 , 0 , 0 , 0 , 0 , 0 ,... 

is 10-steady, but is not e-steady for any smaller value of e, despite the 
sequence converging almost immediately to zero. So we need a more 
robust notion of steadiness that does not care about the initial members 
of a sequence. 

Definition 5.1.6 (Eventual e-steadiness). Let e > 0. A se- 
quence (a n )^ =0 is said to be eventually e-steady iff the sequence 
ajy, ct/v+i, ajv + 2 , • . . is e-steady for some natural number N > 0. In 
other words, the sequence ao,ai,a 2 ,-- - is eventually e-steady iff there 
exists an N > 0 such that d(aj, a^) < e for all j, k > N. 

Example 5.1.7. The sequence ai,U 2 ,... defined by a n := 1/n, (i.e., 
the sequence 1, 1/2, 1/3, 1/4, . . .) is not 0.1-steady, but is eventually 0.1- 
steady, because the sequence oio, an, 012 , • • • (i.e., 1/10,1/11,1/12,...) 
is 0.1-steady. The sequence 10, 0, 0, 0, 0, . . . is not e-steady for any e less 
than 10, but it is eventually e-steady for every e > 0 (why?). 

Now we can finally define the correct notion of what it means for a 
sequence of rationals to “want” to converge. 

Definition 5.1.8 (Cauchy sequences). A sequence (a n )™ =0 of rational 
numbers is said to be a Cauchy sequence iff for every rational e > 0, the 
sequence (a n )^T 0 is eventually e-steady. In other words, the sequence 
ao,ai,a 2 ,... is a Cauchy sequence iff for every e > 0, there exists an 
N > 0 such that d(aj, a^) < e for all j, k > N. 



98 


5. The real numbers 


Remark 5.1.9. At present, the parameter e is restricted to be a posi- 
tive rational; we cannot take e to be an arbitrary positive real number, 
because the real numbers have not yet been constructed. However, once 
we do construct the real numbers, we shall see that the above definition 
will not change if we require £ to be real instead of rational (Proposi- 
tion 6.1.4). In other words, we will eventually prove that a sequence is 
eventually e-steady for every rational e > 0 if and only if it is eventually 
e-steady for every real e > 0; see Proposition 6.1.4. This rather subtle 
distinction between a rational e and a real e turns out not to be very 
important in the long run, and the reader is advised not to pay too much 
attention as to what type of number e should be. 

Example 5.1.10. (Informal) Consider the sequence 

1.4,1.41,1.414,1.4142,... 

mentioned earlier. This sequence is already 1-steady. If one discards the 
first element 1.4, then the remaining sequence 

1.41,1.414,1.4142,... 

is now 0 . 1 -steady, which means that the original sequence was eventu- 
ally 0.1-steady. Discarding the next element gives the 0.01-steady se- 
quence 1.414, 1.4142, . . .; thus the original sequence was eventually 0.01- 
steady. Continuing in this way it seems plausible that this sequence is 
in fact e-steady for every e > 0 , which seems to suggest that this is a 
Cauchy sequence. However, this discussion is not rigorous for several 
reasons, for instance we have not precisely defined what this sequence 

1.4, 1.41, 1.414, . . . really is. An example of a rigorous treatment follows 
next. 

Proposition 5.1.11. The sequence 01 , 02 , 03 ,... defined by a n := 1/n 
(i.e., the sequence 1, 1 / 2 , 1/3, . . .) is a Cauchy sequence. 

Proof. We have to show that for every e > 0, the sequence ai, 02 , . . . is 
eventually e-steady. So let e > 0 be arbitrary. We now have to find a 
number N > 1 such that the sequence a at, ojv+i, ... is e-steady. Let us 
see what this means. This means that d(aj, a j.) < e for every j, k > N, 
i.e. 

\l/j — l/k\ < e for every j, k > N. 

Now since j,k > N, we know that 0 < l/j,l/k < 1/N, so that |1 /j — 
l/k\ < 1/N. So in order to force \1/ j — l/k\ to be less than or equal to 
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e, it would be sufficient for l/N to be less than e. So all we need to do 
is choose an N such that l/N is less than e, or in other words that N is 
greater than 1/e. But this can be done thanks to Proposition 4.4.1. □ 

As you can see, verifying from first principles (i.e., without using any 
of the machinery of limits, etc.) that a sequence is a Cauchy sequence 
requires some effort, even for a sequence as simple as 1 jn. The part 
about selecting an N can be particularly difficult for beginners - one has 
to think in reverse, working out what conditions on N would suffice to 
force the sequence a at, a/v+i, oat + 2, ... to be e-steady, and then finding 
an N which obeys those conditions. Later we will develop some limit 
laws which allow us to determine when a sequence is Cauchy more easily. 

We now relate the notion of a Cauchy sequence to another basic 
notion, that of a bounded sequence. 

Definition 5.1.12 (Bounded sequences). Let M > 0 be rational. A 
finite sequence ai,a2 ,..«,a n is bounded by M iff |a*| < M for all 1 < 
i < n. An infinite sequence (a n )/f =1 is bounded by M iff |oj| < M for all 
i > 1. A sequence is said to be bounded iff it is bounded by M for some 
rational M > 0. 

Example 5.1.13. The finite sequence 1,— 2 , 3 ,— 4 is bounded (in this 
case, it is bounded by 4, or indeed by any M greater than or equal to 
4). But the infinite sequence 1, — 2 , 3 , —4, 5 , —6, ... is unbounded. (Can 
you prove this? Use Proposition 4.4.1.) The sequence 1, — 1, 1, —1, ... is 
bounded (e.g., by 1), but is not a Cauchy sequence. 

Lemma 5.1.14 (Finite sequences are bounded). Every finite seqiLence 
oi, a 2 , ■ ■ ■ , a n is bounded. 

Proof. We prove this by induction on n. When n = 1 the sequence ai is 
clearly bounded, for if we choose M := |ai | then clearly we have |aj| < M 
for all 1 < i < n. Now suppose that we have already proved the lemma 
for some n > 1; we now prove it for n + 1, i.e., we prove every sequence 
ai, 02, • • • , a n + 1 is bounded. By the induction hypothesis we know that 
oi,a 2 ,...,a n is bounded by some M > 0; in particular, it must be 
bounded by M + |o n +i|. On the other hand, a n +i is also bounded by 
M + |a n+ i|. Thus ai, 02, • • • , a n , a n++ is bounded by M + |a n +i|, and is 
hence bounded. This closes the induction. □ 

Note that while this argument shows that every finite sequence is 
bounded, no matter how long the finite sequence is, it does not say 
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anything about whether an infinite sequence is bounded or not; infinity 
is not a natural number. However, we have 

Lemma 5.1.15 (Cauchy sequences are bounded). Every Cauchy se- 
quence («„)«! is bounded. 

Proof. See Exercise 5.1.1. □ 


— Exercises — 

Exercise 5.1.1. Prove Lemma 5.1.15. (Hint: use the fact that a n is eventually 
1-steady, and thus can be split into a finite sequence and a 1-steady sequence. 
Then use Lemma 5.1.14 for the finite part. Note there is nothing special about 
the number 1 used here; any other positive number would have sufficed.) 


5.2 Equivalent Cauchy sequences 


Consider the two Cauchy sequences of rational numbers: 

1.4, 1.41, 1.414, 1.4142, 1.41421, . . . 


and 


1.5, 1.42, 1.415, 1.4143, 1.41422, . . . 


Informally, both of these sequences seem to be converging to the same 
number, the square root y/2 = 1.41421 . . . (though this statement is not 
yet rigorous because we have not defined real numbers yet). If we are to 
define the real numbers from the rationals as limits of Cauchy sequences, 
we have to know when two Cauchy sequences of rationals give the same 
limit, without first defining a real number (since that would be circular). 
To do this we use a similar set of definitions to those used to define a 
Cauchy sequence in the first place. 


Definition 5.2.1 (e-close sequences). Let (a n )^T 0 and (b n )^L 0 be two 
sequences, and let e > 0. We say that the sequence (a n )^ =0 is e-close to 
(b n )^ = o iff a n is e-close to b n for each n G N. In other words, the sequence 
ao, ai, <22, • • • is e-close to the sequence bo, &i, 62, • • • iff \a n ~ b n \ < e for 
all n = 0, 1, 2 , . . .. 


Example 5.2.2. The two sequences 


1 ,- 1 , 1 ,- 1 , 1 ,... 


and 


1 . 1 ,- 1 . 1 , 1 . 1 ,- 1 . 1 , 1 . 1 ,... 
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are 0.1-close to each other. (Note however that neither of them are 
0.1-steady). 

Definition 5.2.3 (Eventually e-close sequences). Let (a n )^L 0 and 
(b n )™ = o be two sequences, and let e > 0. We say that the sequence 
( a n)%L o is eventually s-close to (b n )^ =0 iff there exists an N > 0 such 
that the sequences (a n )™ =N and (6 n )” = N are e-close. In other words, 
ao, ai, 02 , . . . is eventually e-close to bo, b\, 62 , . . . iff there exists an N > 0 
such that | a n — b n \ < e for all n > N. 

Remark 5.2.4. Again, the notations for e-close sequences and eventu- 
ally e-close sequences are not standard in the literature, and we will not 
use them outside of this section. 

Example 5.2.5. The two sequences 


1 . 1 , 1 . 01 , 1 . 001 , 1 . 0001 ,... 


and 


0.9,0.99,0.999,0.9999,... 


are not 0.1-close (because the first elements of both sequences are not 
0.1-close to each other). However, the sequences are still eventually 
0.1-close, because if we start from the second elements onwards in the 
sequence, these sequences are 0.1-close. A similar argument shows that 
the two sequences are eventually 0.01-close (by starting from the third 
element onwards), and so forth. 


Definition 5.2.6 (Equivalent sequences). Two sequences (a n )^T 0 and 
(,b n )^= 0 are equivalent iff for each rational e > 0, the sequences (a n )^ =0 
and (b n )^L 0 are eventually £-close. In other words, 00,01,02 ,... and 
b(hbi,b 2 , ■ ■ ■ are equivalent iff for every rational e > 0, there exists an 
N > 0 such that I a n — b n \ < e for all n > N. 


Remark 5.2.7. As with Definition 5.1.8, the quantity e > 0 is currently 
restricted to be a positive rational, rather than a positive real. However, 
we shall eventually see that it makes no difference whether e ranges over 
the positive rationals or positive reals; see Exercise 6.1.10. 


From Definition 5.2.6 it seems that the two sequences given in Ex- 
ample 5.2.5 appear to be equivalent. We now prove this rigorously. 

Proposition 5.2.8. Let (a n )“ =1 and (b n )™ =1 be the sequences a n = 
1 + 10 -n and b n = 1 — 10 -n . Then the sequences a n , b n are equivalent. 
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Remark 5.2.9. This Proposition, in decimal notation, asserts that 
1.0000 . . . = 0.9999 . . .; see Proposition B.2.3. 

Proof. We need to prove that for every e > 0, the two sequences (a n )“ =1 
and (b n )^f =1 are eventually e-close to each other. So we fix an e > 0. 
We need to find an N > 0 such that (a n )” =JV and (b n ) ( /f =N are e-close; 
in other words, we need to find an IV > 0 such that 

| a n — b n \ < e for all n > N. 

However, we have 

\a n - b n | = |(1 + l(T n ) - (1 - 10-") | = 2 x 10-". 

Since 10 _n is a decreasing function of n (i.e. , 10 _m < 10 _n whenever 
m > n; this is easily proven by induction), and n > N, we have 2 x 
10 _n < 2 x 10 _Ar . Thus we have 

| a n — b n \ < 2 x 10 _Ar for all n > N. 

Thus in order to obtain \a n — b n \ < e for all n > N, it will be sufficient 
to choose N so that 2 x 10 _iV < e. This is easy to do using logarithms, 
but we have not yet developed logarithms yet, so we will use a cruder 
method. First, we observe 10^ is always greater than N for any N > 1 
(see Exercise 4.3.5). Thus 10 _JV < 1/N, and so 2 x 10 _Ar < 2/N. Thus 
to get 2 x 10 _Ar < e, it will suffice to choose N so that 2/N < e, or 
equivalently that N > 2/e. But by Proposition 4.4.1 we can always 
choose such an N, and the claim follows. □ 

— Exercises — 

Exercise 5.2.1. Show that if (a™))^ and (b n )ff =1 are equivalent sequences of 
rationals, then (a n )“ =1 is a Cauchy sequence if and only if (frn))^ is a Cauchy 
sequence. 

Exercise 5.2.2. Let e > 0. Show that if (a n )“ =1 and (& n )^=i are eventually 
£-close, then (a n )“ =1 is bounded if and only if {b n )ff =l is bounded. 

5.3 The construction of the real numbers 

We are now ready to construct the real numbers. We shall introduce 
a new formal symbol LIM, similar to the formal notations — and // 
defined earlier; as the notation suggests, this will eventually match the 
familiar operation of lim, at which point the formal limit symbol can be 
discarded. 
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Definition 5.3.1 (Real numbers). A real number is defined to be an 
object of the form LIM n ^.ooa n , where (a n )ff =l is a Cauchy sequence of 
rational numbers. Two real numbers LEVR^oo a n and LIM n ^oo b n are 
said to be equal iff (a n )^L 1 and (b n )ff =1 are equivalent Cauchy sequences. 
The set of all real numbers is denoted R. 

Example 5.3.2. (Informal) Let 01,02,03, . . . denote the sequence 

1.4. 1.41. 1.414. 1.4142. 1.41421. . . . 
and let b\, b- 2 , 63, . . . denote the sequence 

1.5. 1.42. 1.415. 1.4143. 1.41422. . . . 

then LIM.n_ 3 . 0 o a n is a real number, and is the same real number as 
LIMjj—^oo b n , because (a n )ff =1 and (b n )ff =1 are equivalent Cauchy se- 
quences: LIM n — >oo a n = LIM n _»oo b n . 

We will refer to LIM n ^.oo a n as the formal limit of the sequence 
( a n ) 1 ■ Later on we will define a genuine notion of limit, and show 
that the formal limit of a Cauchy sequence is the same as the limit 
of that sequence; after that, we will not need formal limits ever again. 
(The situation is much like what we did with formal subtraction — and 
formal division //.) 

In order to ensure that this definition is valid, we need to check 
that the notion of equality in the definition obeys the first three laws of 
equality: 

Proposition 5.3.3 (Formal limits are well-defined). Let x = 
LIMn—^oo a n , y = LLVR-kx, b n , and z = LIM n _>oo c n be real numbers. 
Then, with the above definition of equality for real numbers, we have 
x = x. Also, if x = y, then y = x. Finally, if x = y and y = z, then 
x = z. 

Proof. See Exercise 5.3.1. □ 

Because of this proposition, we know that our definition of equality 
between two real numbers is legitimate. Of course, when we define other 
operations on the reals, we have to check that they obey the law of 
substitution: two real number inputs which are equal should give equal 
outputs when applied to any operation on the real numbers. 

Now we want to define on the real numbers all the usual arithmetic 
operations, such as addition and multiplication. We begin with addition. 
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Definition 5.3.4 (Addition of reals). Let x = LIM^^qo a n and y = 
LIM n ^oo b n be real numbers. Then we define the sum x + y to be 
x T y ■ — LIM n _ ) . 00 (a n + bn). 

Example 5.3.5. The sum of LIM^^oo 1 + 1/n and LIM^-^ 2 + 3/n is 
LIM-n—^oo 3 + 4/n. 

We now check that this definition is valid. The first thing we need 
to do is to confirm that the sum of two real numbers is in fact a real 
number: 

Lemma 5.3.6 (Sum of Cauchy sequences is Cauchy). Let x = 
LIM, WOO a n and y = LIMj^qo b n be real numbers. Then x + y is also a 
real number (i.e., (a n + b n )%T 1 is a Cauchy sequence of rationals ). 

Proof. We need to show that for every e > 0, the sequence (a n + b n )ff =l 
is eventually e-steady. Now from hypothesis we know that (a n )ff =l is 
eventually e-steady, and {b n )ff =1 is eventually e-steady, but it turns out 
that this is not quite enough (this can be used to imply that ( a n +b n )ff =1 
is eventually 2e-steady, but that’s not what we want). So we need to do 
a little trick, which is to play with the value of e. 

We know that (a n )^_ 1 is eventually 5-steady for every value of 6. 
This implies not only that (o n )“ =1 is eventually e-steady, but it is also 
eventually e/2-steady. Similarly, the sequence (b n )ff =1 is also eventually 
e/2-steady. This will turn out to be enough to conclude that (a n +b n )ff =1 
is eventually e-steady. 

Since (a n )ff =1 is eventually e/2-steady, we know that there exists an 
N > 1 such that (a n )ff =N is e/2-steady, i.e., a n and a m are e/2-close for 
every n,m > N. Similarly there exists an M > 1 such that (b n )ff =M is 
e/2-steady, i.e., b n and b m are e/2-close for every n,m> M. 

Let max(7V, M ) be the larger of N and M (we know from Proposition 
2.2.13 that one has to be greater than or equal to the other). If n, m > 
max(lV, M), then we know that a n and a m are e/2-close, and b n and 
b rn are e/2-close, and so by Proposition 4.3.7 we see that a n + b n and 
a m + b m are e-close for every n, m > rna x(iV, M). This implies that the 
sequence (a n + b n )ff =1 is eventually e-close, as desired. □ 

The other thing we need to check is the axiom of substitution (see 
Section A. 7): if we replace a real number x by another number equal to 
x, this should not change the sum x + y (and similarly if we substitute 
y by another number equal to y). 



5.3. The construction of the real numbers 


105 


Lemma 5.3.7 (Sums of equivalent Cauchy sequences are equivalent). 
Let x = LIMfj—^oo a n , y = LIM n _»oo b n , and x' = LIM^-kx, a' n be real 
numbers. Suppose that x = x' . Then we have x + y = x' + y. 

Proof. Since x and x' are equal, we know that the Cauchy sequences 
( a n ) n= i an d ( ) n= i are equivalent, so in other words they are eventually 
e-close for each e > 0 . We need to show that the sequences ( a n + b n )ff =1 
and ( a' n + b n )ff =1 are eventually e-close for each e > 0 . But we already 
know that there is an N > 1 such that (a„)“ =JV and (a' n )ff =N are e-close, 
i.e., that a n and a' n are e-close for each n > N. Since b n is of course 0 - 
close to b n , we thus see from Proposition 4.3.7 (extended in the obvious 
manner to the <5 = 0 case) that a n + b n and a' n + b n are e-close for each 
n > N. This implies that ( a n + b n )ff =1 and ( a' n + b n )ff =1 are eventually 
e-close for each e > 0, and we are done. □ 

Remark 5.3.8. The above lemma verifies the axiom of substitution 
for the “x” variable in x + y, but one can similarly prove the axiom 
of substitution for the “y” variable. (A quick way is to observe from 
the definition of x + y that we certainly have x + y = y + x, since 
o-n + b n = b n + a n .) 

We can define multiplication of real numbers in a manner similar to 
that of addition: 

Definition 5.3.9 (Multiplication of reals). Let x = LIM™-^ a n and 
y = LIMj^oo b n be real numbers. Then we define the product xy to be 
xy LIM^—^oo (in b n . 

The following Proposition ensures that this definition is valid, and 
that the product of two real numbers is in fact a real number: 

Proposition 5.3.10 (Multiplication is well defined). Let x = 
LIMjj—^oq a n , y = LIM n ^. oc ,6 n , and x' = LIM, woo a' n be real numbers. 
Then xy is also a real number. Furthermore, if x = x ' , then xy = x'y. 

Proof. See Exercise 5 . 3 . 2 . □ 

Of course we can prove a similar substitution rule when y is replaced 
by a real number y' which is equal to y. 

At this point we embed the rationals back into the reals, by equating 
every rational number q with the real number LIM n ^.oo q. For instance, 
if ai, 02, 03 , . . . is the sequence 


0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 , ... 
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then we set LIM n _ >00 a n equal to 0.5. This embedding is consistent 
with our definitions of addition and multiplication, since for any rational 
numbers a, b we have 

(LIM, WOO a) + (LIM, WOO b) = LIM, woo (a + b ) and 
(LIM n _ s . 0O a) x (LIM^-kx, b) = LIM^oo (ab) ; 

this means that when one wants to add or multiply two rational numbers 
a, b it does not matter whether one thinks of these numbers as rationals 
or as the real numbers LIM^^oo a, LIMy^oo b. Also, this identification 
of rational numbers and real numbers is consistent with our definitions 
of equality (Exercise 5.3.3). 

We can now easily define negation — x for real numbers x by the 
formula 

—x := (—1) X x, 

since —1 is a rational number and is hence real. Note that this is clearly 
consistent with our negation for rational numbers since we have — q = 
(— 1) x q for all rational numbers q. Also, from our definitions it is clear 
that 

LIM n _^. 00 a n — LIM, WOO ( o n ) 

(why?). Once we have addition and negation, we can define subtraction 
as usual by 

x-y :=x + (-y), 

note that this implies 

a n 1 .1 .\ I, ( , -x. by i — (ayy ) • 

We can now easily show that the real numbers obey all the usual 
rules of algebra (except perhaps for the laws involving division, which 
we shall address shortly): 

Proposition 5.3.11. All the laws of algebra from Proposition 4-1.6 hold 
not only for the integers, but for the reals as well. 

Proof. We illustrate this with one such rule: x(y + z) = xy + xz. Let 
x = LIMj^oo a n ,y = LIM, WOO b n , and z = LIMn-^ c n be real numbers. 
Then by definition, xy = LIM, WOO a n b n and xz = LIM n ^.oo a n c n , and so 
xy + xz = LIM r) _ 5 . 0O (a n 6 n + a n c n ). A similar line of reasoning shows that 
x(y + z) = LIM, WOO a n (b n + c n ). But we already know that a n (b n + c n ) 
is equal to a n b n + a n c n for the rational numbers a n , b n , c n , and the claim 
follows. The other laws of algebra are proven similarly. □ 
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The last basic arithmetic operation we need to define is reciprocation: 
x — >• x~ l . This one is a little more subtle. On obvious first guess for 
how to proceed would be define 

(LIM, WOO a n ) . — LIMfi^oo ci n , 

but there are a few problems with this. For instance, let ai, < 22 , 03 , . . . 
be the Cauchy sequence 

0 . 1 , 0 . 01 , 0 . 001 , 0 . 0001 ,..., 

and let x := LIMn^ocOn. Then by this definition, x -1 would be 
LIM, WOO b n , where b\, 62 , 63 , . . . is the sequence 

10 , 100 , 1000 , 10000 , . . . 

but this is not a Cauchy sequence (it isn’t even bounded). Of course, the 
problem here is that our original Cauchy sequence (a n )^ =1 was equiva- 
lent to the zero sequence ( 0)^ =1 (why?), and hence that our real number 
x was in fact equal to 0. So we should only allow the operation of recip- 
rocal when x is non-zero. 

However, even when we restrict ourselves to non-zero real numbers, 
we have a slight problem, because a non-zero real number might be the 
formal limit of a Cauchy sequence which contains zero elements. For 
instance, the number 1 , which is rational and hence real, is the formal 
limit 1 = LIM n ^. 0O a n of the Cauchy sequence 

0,0.9,0.99,0.999,0.9999,... 

but using our naive definition of reciprocal, we cannot invert the real 
number 1, because we can’t invert the first element 0 of this Cauchy 
sequence! 

To get around these problems we need to keep our Cauchy sequence 
away from zero. To do this we first need a definition. 

Definition 5.3.12 (Sequences bounded away from zero). A sequence 
(a n )n=i °f rational numbers is said to be bounded away from zero iff 
there exists a rational number c > 0 such that \a n \ > c for all n > 1 . 

Examples 5.3.13. The sequence 1, —1, 1, —1, 1, —1, 1, ... is bounded 
away from zero (all the coefficients have absolute value at least 1). But 
the sequence 0 . 1 , 0 . 01 , 0 . 001 ,... is not bounded away from zero, and 
neither is 0, 0.9, 0.99, 0.999, 0.9999, .... The sequence 10, 100, 1000, ... is 
bounded away from zero, but is not bounded. 
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We now show that every non-zero real number is the formal limit of 
a Cauchy sequence bounded away from zero: 

Lemma 5.3.14. Let x be a non-zero real number. Then x = 
LIMn-^oo a n for some Cauchy sequence {a n )ff =l which is bounded away 
from zero. 

Proof. Since x is real, we know that x = LIM n _>oo b n for some Cauchy 
sequence (b n )^ =1 . But we are not yet done, because we do not know 
that b n is bounded away from zero. On the other hand, we are given 
that x/0 = LIM n ^oo 0, which means that the sequence (6 n )^L 1 is not 
equivalent to (0)^_ 1 . Thus the sequence (b n )ff =1 cannot be eventually 
e-close to (0)^L 1 for every e > 0. Therefore we can find an e > 0 such 
that {b n )ff = i is not eventually e-close to (0)^_ 1 . 

Let us fix this e. We know that (b n )^ =1 is a Cauchy sequence, so it is 
eventually e-steady. Moreover, it is eventually e/2-steady, since e/2 > 0. 
Thus there is an N > 1 such that \b n — b m \ < e/2 for all n,m> N. 

On the other hand, we cannot have \b n \ < e for all n > N, since this 
would imply that (b n )^ =1 is eventually e-close to (0)^ =1 . Thus there 
must be some no > N for which |6 no | > e. Since we already know 
that | b no — b n | < e/2 for all n > N, we thus conclude from the triangle 
inequality (how?) that \b n \ > e/2 for all n > N. 

This almost proves that (b n )^ =1 is bounded away from zero. Actu- 
ally, what it does is show that (b n )° Ti is eventually bounded away from 
zero. But this is easily fixed, by defining a new sequence a n , by setting 
a n := e/2 if n < N and a n := b n if n > N . Since b n is a Cauchy se- 
quence, it is not hard to verify that a n is also a Cauchy sequence which 
is equivalent to b n (because the two sequences are eventually the same), 
and so x = a n . And since \b n \ > e/2 for all n > N, we know 

that \a n \ > e/2 for all n > 1 (splitting into the two cases n > N and 
n < N separately). Thus we have a Cauchy sequence which is bounded 
away from zero (by e/2 instead of e, but that’s still OK since e/2 > 0), 
and which has x as a formal limit, and so we are done. □ 

Once a sequence is bounded away from zero, we can take its recip- 
rocal without any difficulty: 

Lemma 5.3.15. Suppose that (a n )^L 1 is a Cauchy sequence which is 
bounded away from zero. Then the sequence (a“ 1 )^ =1 is also a Cauchy 
sequence. 
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Proof. Since (a n )ff =1 is bounded away from zero, we know that there is 
a c > 0 such that \a n \ > c for all n > 1. Now we need to show that 
(a“ 1 )^ =1 is eventually e-steady for each e > 0. Thus let us fix an e > 0; 
our task is now to find an N > 1 such that \a~ 1 — a” 1 ) < e for all 
n, m > N. But 
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(since |a m |, \a n \ > c), and so to make la" 1 — a" 1 ! less than or equal to 
e, it will suffice to make |a m — a n | less than or equal to c 2 e. But since 
(a n )^ =1 is a Cauchy sequence, and c 2 e > 0, we can certainly find an N 
such that the sequence (a n )ff =N is c 2 e-steady, i.e., |a m — a n \ < c 2 e for all 
n > N. By what we have said above, this shows that la" 1 — a" 1 ) < e for 
all m, n > N , and hence the sequence (a" 1 )^ =1 is eventually e-steady. 
Since we have proven this for every s, we have that (a" 1 )^ =1 is a Cauchy 
sequence, as desired. □ 


We are now ready to define reciprocation: 

Definition 5.3.16 (Reciprocals of real numbers). Let x be a non-zero 
real number. Let (a n )ff =1 be a Cauchy sequence bounded away from zero 
such that x = LIM n _ s . 00 a n (such a sequence exists by Lemma 5.3.14). 
Then we define the reciprocal x" 1 by the formula x" 1 := LIlVR^oo a" 1 . 
(From Lemma 5.3.15 we know that x" 1 is a real number.) 

We need to check one thing before we are sure this definition 
makes sense: what if there are two different Cauchy sequences (a n )ff =l 
and {b n )ff =l which have x as their formal limit, x = LIM, woo a n = 
LIMjj—^qo b n . The above definition might conceivably give two different 
reciprocals x" 1 , namely LIMn^ooa" 1 and LIM n _ > , 00 6" 1 . Fortunately, 
this never happens: 

Lemma 5.3.17 (Reciprocation is well defined). Let (a n )ff =1 and 
(b n )ff =1 be two Cauchy sequences bounded away from zero such that 
LIM, WOO a n = LIM, WOO b n (i.e., the two sequences are equivalent). 
Then LUVR^oo a" 1 = LLVR^oo 6" 1 . 

Proof. Consider the following product P of three real numbers: 

P := (LIlVR^oo a" 1 ) x (LIM^oo a n ) x (LIM 

n— >• oo )• 

If we multiply this out, we obtain 

P = LIM^oo ci n a n b n = LIM n ^.oo b n . 
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On the other hand, since LIM JWOO a n = LIM, WOO b n , we can write P in 
another way as 

P = (LIMn^ooa” 1 ) x (LIMn^oo 6 n ) x (LIM 

n— )■ oo b n ) 

(cf. Proposition 5.3.10). Multiplying things out again, we get 

P = LIM^oo a^bnb- 1 = LIM n ^oo a“*. 

Comparing our different formulae for P we see that LIM r) ,_ 5 . 00 a~ l = 
LIM, woo b~ 1 , as desired. □ 

Thus reciprocal is well-dehned (for each non-zero real number x, 
we have exactly one definition of the reciprocal x _1 ). Note it is clear 
from the definition that xx^ 1 = x _1 x = 1 (why?); thus all the field 
axioms (Proposition 4.2.4) apply to the reals as well as to the rationals. 
We of course cannot give 0 a reciprocal, since 0 multiplied by anything 
gives 0, not 1. Also note that if q is a non-zero rational, and hence 
equal to the real number LIM n _ >00 q, then the reciprocal of LIM n _ >00 q 
is LIM n _j.oo q -1 = g _1 ; thus the operation of reciprocal on real numbers 
is consistent with the operation of reciprocal on rational numbers. 

Once one has reciprocal, one can define division x/y of two real 
numbers x, y, provided y is non-zero, by the formula 

x/y := xx y -1 , 

just as we did with the rationals. In particular, we have the cancellation 
law : if x, y, z are real numbers such that xz = yz, and z is non-zero, then 
by dividing by z we conclude that x = y. Note that this cancellation 
law does not work when z is zero. 

We now have all four of the basic arithmetic operations on the reals: 
addition, subtraction, multiplication, and division, with all the usual 
rules of algebra. Next we turn to the notion of order on the reals. 

— Exercises — 

Exercise 5.3.1. Prove Proposition 5.3.3. (Hint: you may find Proposition 4.3.7 
to be useful.) 

Exercise 5.3.2. Prove Proposition 5.3.10. (Hint: again, Proposition 4.3.7 may 
be useful.) 

Exercise 5.3.3. Let a, b be rational numbers. Show that a = b if and only if 
LIM n _^ oc a = LIM n _^oo b (i.e. , the Cauchy sequences a, a, a, a, . . . and b,b,b,b . . . 
equivalent if and only if a = b). This allows us to embed the rational numbers 
inside the real numbers in a well-defined manner. 
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Exercise 5.3.4. Let (a n )^L 0 be a sequence of rational numbers which is 
bounded. Let (b n )%L 0 be another sequence of rational numbers which is equiv- 
alent to (a„)“ L 0 . Show that (6„)^ =0 is also bounded. (Hint: use Exercise 
5.2.2.) 

Exercise 5.3.5. Show that LIMn-^ 1/n = 0. 

5.4 Ordering the reals 

We know that every rational number is positive, negative, or zero. We 
now want to say the same thing for the reals: each real number should 
be positive, negative, or zero. Since a real number x is just a formal 
limit of rationals a n , it is tempting to make the following definition: a 
real number x = LIM, WOO a n is positive if all of the a n are positive, 
and negative if all of the a n are negative (and zero if all of the a n are 
zero). However, one soon realizes some problems with this definition. 
For instance, the sequence (a n )^ =1 defined by a n := 10 -n , thus 

0 . 1 , 0 . 01 , 0 . 001 , 0 . 0001 ,... 

consists entirely of positive numbers, but this sequence is equivalent to 
the zero sequence 0, 0,0,0,... and thus LIM.^oo a n = 0. Thus even 
though all the rationals were positive, the real formal limit of these 
rationals was zero rather than positive. Another example is 

0 . 1 ,- 0 . 01 , 0 . 001 ,- 0 . 0001 ,...; 

this sequence is a hybrid of positive and negative numbers, but again 
the formal limit is zero. 

The trick, as with the reciprocals in the previous section, is to limit 
one’s attention to sequences which are bounded away from zero. 

Definition 5.4.1. Let (a n )ff =l be a sequence of rationals. We say that 
this sequence is positively bounded away from zero iff we have a positive 
rational c > 0 such that a n > c for all n > 1 (in particular, the sequence 
is entirely positive) . The sequence is negatively bounded away from zero 
iff we have a negative rational — c < 0 such that a n < — c for all n > 1 
(in particular, the sequence is entirely negative). 

Examples 5.4.2. The sequence 1.1, 1.01, 1.001, 1.0001, ... is positively 
bounded away from zero (all terms are greater than or equal to 1). The 
sequence —1.1, —1.01, —1.001, —1.0001, .. . is negatively bounded away 
from zero. The sequence 1, — 1, 1, — 1, 1, — 1, . . . is bounded away from 
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zero, but is neither positively bounded away from zero nor negatively 
bounded away from zero. 

It is clear that any sequence which is positively or negatively bounded 
away from zero, is bounded away from zero. Also, a sequence cannot be 
both positively bounded away from zero and negatively bounded away 
from zero at the same time. 

Definition 5.4.3. A real number x is said to be positive iff it can be 
written as x = LIM n ^.oo a n for some Cauchy sequence which 

is positively bounded away from zero, x is said to be negative iff it 
can be written as x = LIM^-^ a n for some sequence (a n )“ =1 which is 
negatively bounded away from zero. 

Proposition 5.4.4 (Basic properties of positive reals). For every real 
number x, exactly one of the following three statements is true: (a) x is 
zero; ( b ) x is positive; (c) x is negative. A real number x is negative if 
and only if —x is positive. If x and y are positive, then so are x + y and 
xy. 

Proof. See Exercise 5.4.1. □ 

Note that if q is a positive rational number, then the Cauchy sequence 
q,q,q, . . . is positively bounded away from zero, and hence LIMn-^ q = 
q is a positive real number. Thus the notion of positivity for rationals 
is consistent with that for reals. Similarly, the notion of negativity for 
rationals is consistent with that for reals. 

Once we have defined positive and negative numbers, we can define 
absolute value and order. 

Definition 5.4.5 (Absolute value). Let x be a real number. We define 
the absolute value |x| of x to equal x if x is positive, — x when x is 
negative, and 0 when x is zero. 

Definition 5.4.6 (Ordering of the real numbers). Let x and y be real 
numbers. We say that x is greater than y, and write x > y, if x — y is a 
positive real number, and x < y iS x — y is a negative real number. We 
define x > y iff x > y or x = y, and similarly define x < y. 

Comparing this with the definition of order on the rationals from 
Definition 4.2.8 we see that order on the reals is consistent with order 
on the rationals, i.e. , if two rational numbers q, q' are such that q is less 
than q' in the rational number system, then q is still less than q' in the 
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real number system, and similarly for “greater than”. In the same way 
we see that the definition of absolute value given here is consistent with 
that in Definition 4.3.1. 

Proposition 5.4.7. All the claims in Proposition 4-2.9 which held for 
rationals, continue to hold for real numbers. 

Proof. We just prove one of the claims and leave the rest to Exercise 
5.4.2. Suppose we have x < y and z a positive real, and want to conclude 
that xz < yz. Since x < y, y — x is positive, hence by Proposition 5.4.4 
we have (y — x)z = yz — xz is positive, hence xz < yz. □ 

As an application of these propositions, we prove 

Proposition 5.4.8. Let x be a positive real number. Then x _1 is also 
positive. Also, if y is another positive number and x > y, then re -1 < 

y _1 - 

Proof. Let x be positive. Since xx -1 = 1, the real number x^ 1 cannot be 
zero (since xO = 0 ^ 1). Also, from Proposition 5.4.4 it is easy to see that 
a positive number times a negative number is negative; this shows that 
x _1 cannot be negative, since this would imply that xx _1 = 1 is negative, 
a contradiction. Thus, by Proposition 5.4.4, the only possibility left is 
that x _1 is positive. 

Now let y be positive as well, so x _1 and y are also positive. If 
x _1 > y _1 , then by Proposition 5.4.7 we have xx -1 > yx~ x > yy _1 , 
thus 1 > 1, which is a contradiction. Thus we must have x -1 < y _1 . □ 

Another application is that the laws of exponentiation (Proposition 
4.3.12) that were previously proven for rationals, are also true for reals; 
see Section 5.6. 

We have already seen that the formal limit of positive rationals need 
not be positive; it could be zero, as the example 0.1,0.01,0.001,... 
showed. However, the formal limit of non-negative rationals (i.e. , ra- 
tionals that are either positive or zero) is non-negative. 

Proposition 5.4.9 (The non-negative reals are closed). Let a\, ai, 
03 , . . . be a Cauchy sequence of non-negative rational numbers. Then 
LIMfj—^oo a n is a non-negative real number. 

Eventually, we will see a better explanation of this fact: the set of 
non-negative reals is closed , whereas the set of positive reals is open. See 
Section 11.4. 
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Proof. We argue by contradiction, and suppose that the real number 
x := LIM n _), 00 a n is a negative number. Then by definition of negative 
real number, we have x = LIM.n_ 5 . 0 o b n for some sequence b n which is 
negatively bounded away from zero, i.e., there is a negative rational 
— c < 0 such that b n < — c for all n > 1. On the other hand, we have 
a n > 0 for all n > 1, by hypothesis. Thus the numbers a n and b n are 
never c/ 2-close, since c/2 < c. Thus the sequences (a n )^T 1 and ( 6 n )^ =1 
are not eventually c/ 2-close. Since c /2 > 0, this implies that (a n )^ =1 
and (b n )^ =1 are not equivalent. But this contradicts the fact that both 
these sequences have x as their formal limit. □ 

Corollary 5.4.10. Let (a n )^ =1 and (bn)'^ > = i be Cauchy sequences of 
rationals such that a n > b n for all n > 1 . Then LIM n ->.oo > 

LIM n _ »oo b n . 

Proof. Apply Proposition 5.4.9 to the sequence a n — b n . □ 

Remark 5.4.11. Note that the above Corollary does not work if the > 
signs are replaced by >: for instance if a n := 1 + 1 /n and b n := 1 — 1 /n, 
then a n is always strictly greater than b n , but the formal limit of a n is 
not greater than the formal limit of b n , instead they are equal. 

We now define distance d(x,y) := \x — y\ just as we did for the 
rationals. In fact, Propositions 4.3.3 and 4.3.7 hold not only for the 
rationals, but for the reals; the proof is identical, since the real numbers 
obey all the laws of algebra and order that the rationals do. 

We now observe that while positive real numbers can be arbitrarily 
large or small, they cannot be larger than all of the positive integers, or 
smaller in magnitude than all of the positive rationals: 

Proposition 5.4.12 (Bounding of reals by rationals). Let x be a positive 
real number. Then there exists a positive rational number q such that 
q < x, and there exists a positive integer N such that x < N. 

Proof. Since x is a positive real, it is the formal limit of some Cauchy 
sequence (a n )^ =1 which is positively bounded away from zero. Also, by 
Lemma 5.1.15, this sequence is bounded. Thus we have rationals q > 0 
and r such that q < a n < r for all n > 1. But by Proposition 4.4.1 we 
know that there is some integer N such that r < N; since q is positive 
and q < r < N, we see that N is positive. Thus q < a n < N for 
all n > 1. Applying Corollary 5.4.10 we obtain that q < x < N, as 
desired. □ 
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Corollary 5.4.13 (Archimedean property). Let x and e be any positive 
real numbers. Then there exists a positive integer M such that Me > x. 

Proof. The number x/e is positive, and hence by Proposition 5.4.12 
there exists a positive integer N such that x/e < N. If we set M := 
N + 1, then x/e < M. Now multiply by e. □ 

This property is quite important; it says that no matter how large x 
is and how small e is, if one keeps adding e to itself, one will eventually 
overtake x. 

Proposition 5.4.14. Given any two real numbers x < y, we can find a 
rational number q such that x < q < y. 

Proof. See Exercise 5.4.5. □ 

We have now completed our construction of the real numbers. This 
number system contains the rationals, and has almost everything that 
the rational number system has: the arithmetic operations, the laws of 
algebra, the laws of order. However, we have not yet demonstrated any 
advantages that the real numbers have over the rationals; so far, even 
after much effort, all we have done is shown that they are at least as good 
as the rational number system. But in the next few sections we show 
that the real numbers can do more things than rationals: for example, 
we can take square roots in a real number system. 

Remark 5.4.15. Up until now, we have not addressed the fact that 
real numbers can be expressed using the decimal system. For instance, 
the formal limit of 


1.4, 1.41, 1.414, 1.4142, 1.41421, . . . 

is more conventionally represented as the decimal 1.41421 . . .. We will 
address this in an Appendix (§B), but for now let us just remark that 
there are some subtleties in the decimal system, for instance 0.9999 . . . 
and 1.000 . . . are in fact the same real number. 

— Exercises — 

Exercise 5.4.1. Prove Proposition 5.4.4. (Hint: if x is not zero, and x is the 
formal limit of some sequence (fln)^, then this sequence cannot be eventually 
£-close to the zero sequence (0)5^ for every single e > 0. Use this to show 
that the sequence (o^)^ is eventually either positively bounded away from 
zero or negatively bounded away from zero.) 
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Exercise 5.4.2. Prove the remaining claims in Proposition 5.4.7. 

Exercise 5.4.3. Show that for every real number x there is exactly one integer 
N such that N < x < N + 1. (This integer N is called the integer part of x, 
and is sometimes denoted N = [xj.) 

Exercise 5.4.4. Show that for any positive real number x > 0 there exists a 
positive integer N such that x > 1/N > 0. 

Exercise 5.4.5. Prove Proposition 5.4.14. (Hint: use Exercise 5.4.4. You may 
also need to argue by contradiction.) 

Exercise 5.4.6. Let x, y be real numbers and let e > 0 be a positive real. Show 
that \x — y\ < e if and only if z/ — £ < x < y + e, and that \x — y\ < £ if and 
only if y — e<x<y + e. 

Exercise 5.4.7. Let x and y be real numbers. Show that x < y + e for all real 
numbers e > 0 if and only if x < y. Show that \x — y\ < e for all real numbers 
£ > 0 if and only if x = y. 

Exercise 5.4.8. Let (a n )“ =1 be a Cauchy sequence of rationals, and let x be 
a real number. Show that if a n < x for all n > 1, then LIM^qq a„ < x. 
Similarly, show that if a n > x for all n > 1, then LIMn-^ a n > x. (Hint: prove 
by contradiction. Use Proposition 5.4.14 to find a rational between LIMn-^ a n 
and x, and then use Proposition 5.4.9 or Corollary 5.4.10.) 


5.5 The least upper bound property 

We now give one of the most basic advantages of the real numbers over 
the rationals; one can take the least upper bound sup (E) of any subset 
E of the real numbers R. 

Definition 5.5.1 (Upper bound). Let £ be a subset of R, and let M 
be a real number. We say that M is an upper bound for E, iff we have 
x < M for every element x in E. 

Example 5.5.2. Let E be the interval E := {x € R : 0 < x < 1}. Then 
1 is an upper bound for E, since every element of E is less than or equal 
to 1. It is also true that 2 is an upper bound for E, and indeed every 
number greater or equal to 1 is an upper bound for E. On the other 
hand, any other number, such as 0.5, is not an upper bound, because 0.5 
is not larger than every element in E. (Merely being larger than some 
elements of E is not necessarily enough to make 0.5 an upper bound.) 
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Example 5.5.3. Let R + be the set of positive reals: R + := {x € R : 
x > 0}. Then R + does not have any upper bounds 3 at all (why?). 

Example 5.5.4. Let 0 be the empty set. Then every number M is 
an upper bound for 0, because M is greater than every element of the 
empty set (this is a vacuously true statement, but still true). 

It is clear that if M is an upper bound of E, then any larger number 
M' > M is also an upper bound of E. On the other hand, it is not so 
clear whether it is also possible for any number smaller than M to also 
be an upper bound of E. This motivates the following definition: 

Definition 5.5.5 (Least upper bound). Let E be a subset of R, and M 
be a real number. We say that M is a least upper bound for E iff (a) M 
is an upper bound for E, and also (b) any other upper bound M' for E 
must be larger than or equal to M. 

Example 5.5.6. Let E be the interval E := {x € R : 0 < x < 1}. 
Then, as noted before, E has many upper bounds, indeed every number 
greater than or equal to 1 is an upper bound. But only 1 is the least 
upper bound; all other upper bounds are larger than 1. 

Example 5.5.7. The empty set does not have a least upper bound 
(why?). 

Proposition 5.5.8 (Uniqueness of least upper bound). Let E be a sub- 
set of R. Then E can have at most one least upper bound. 

Proof. Let M \ and M 2 be two least upper bounds, say M\ and M 2 . 
Since M\ is a least upper bound and M 2 is an upper bound, then by 
definition of least upper bound we have M 2 > M\ . Since M 2 is a least 
upper bound and M\ is an upper bound, we similarly have M\ > M 2 - 
Thus Mi = M 2 . Thus there is at most one least upper bound. □ 

Now we come to an important property of the real numbers: 

Theorem 5.5.9 (Existence of least upper bound). Let E be a non- 
empty subset of R. If E has an upper bound, (i.e., E has some upper 
bound M), then it must have exactly one least upper bound. 


3 More precisely, R + has no upper bounds which are real numbers. In Section 6.2 
we shall introduce the extended real number system R* , which allows one to give the 
upper bound of +00 for sets such as R + . 
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Proof. This theorem will take quite a bit of effort to prove, and many 
of the steps will be left as exercises. 

Let E be a non-empty subset of R with an upper bound M. By 
Proposition 5.5.8, we know that E has at most one least upper bound; 
we have to show that E has at least one least upper bound. Since E is 
non-empty, we can choose some element xq in E. 

Let n > 1 be a positive integer. We know that E has an upper 
bound M. By the Archimedean property (Corollary 5.4.13), we can find 
an integer K such that K/n > M, and hence K/n is also an upper 
bound for E. By the Archimedean property again, there exists another 
integer L such that L/n < xq . Since xo lies in E, we see that L/n is not 
an upper bound for E. Since K/n is an upper bound but L/n is not, 
we see that K > L. 

Since K/n is an upper bound for E and L/n is not, we can find an 
integer L < m n < K with the property that m n /n is an upper bound 
for E. but ( m n — 1 )/n is not (see Exercise 5.5.2). In fact, this integer m n 
is unique (Exercise 5.5.3). We subscript m n by n to emphasize the fact 
that this integer m depends on the choice of n. This gives a well-defined 
(and unique) sequence mi, m 2 , m 3 , ... of integers, with each of the m n /n 
being upper bounds and each of the (m n — l)/n not being upper bounds. 

Now let IV > 1 be a positive integer, and let n,n' > N be integers 
larger than or equal to N. Since m n /n is an upper bound for E and 
(m n / — l)/n' is not, we must have m n /n > (m n / — l)/n' (why?). After 
a little algebra, this implies that 

m„ _ m n i _ 1 > _ 1 
n n' n' ~ N 


Similarly, since m n / /n! is an upper bound for E and (m n 
we have m n i/n' > (m n — 1 )/n, and hence 

m n _ nv 1 

n n' ~ n ~ N 

Putting these two bounds together, we see that 

< — for all n, nl > N > 1. 


TTT-tt, Win' 
! 


n 


n 


1 )/n is not, 


This implies that r K k is a Cauchy sequence (Exercise 5.5.4). Since the 
— are rational numbers, we can now define the real number S as 

S:= LIM^^. 

n 
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From Exercise 5.3.5 we conclude that 


S 


LIM 


n— »■ oo 


m n - 1 


n 


To finish the proof of the theorem, we need to show that S is the least 
upper bound for E. First we show that it is an upper bound. Let x 
be any element of E. Then, since m n /n is an upper bound for E, we 
have x < m n /n for all n > 1. Applying Exercise 5.4.8, we conclude that 
x < LIMjj^oo m n /n = S. Thus S is indeed an upper bound for E. 

Now we show it is a least upper bound. Suppose y is an upper 
bound for E. Since (rn n — l)/n is not an upper bound, we conclude that 
V > (m n — 1 )/n for all n > 1. Applying Exercise 5.4.8, we conclude that 
y > LIM n ^.oo(?n n — 1 )/n = S. Thus the upper bound S is less than or 
equal to every upper bound of E, and S is thus a least upper bound of 
E. □ 


Definition 5.5.10 (Supremum). Let E be a subset of the real numbers. 
If E is non-empty and has some upper bound, we define sup(E) to be 
the least upper bound of E (this is well-defined by Theorem 5.5.9). We 
introduce two additional symbols, +oo and — oo. If E is non-empty 
and has no upper bound, we set sup(E’) := +oo; if E is empty, we set 
sup(E) := — oo. We refer to sup (E) as the supremum of E. and also 
denote it by supE. 

Remark 5.5.11. At present, +oo and — oo are meaningless symbols; we 
have no operations on them at present, and none of our results involving 
real numbers apply to +oo and — oo, because these are not real numbers. 
In Section 6.2 we add +oo and — oo to the reals to form the extended 
real number system, but this system is not as convenient to work with 
as the real number system, because many of the laws of algebra break 
down. For instance, it is not a good idea to try to define +oo -| — oo; 
setting this equal to 0 causes some problems. 

Now we give an example of how the least upper bound property is 
useful. 

Proposition 5.5.12. There exists a positive real number x such that 
x 2 = 2. 

Remark 5.5.13. Comparing this result with Proposition 4.4.4, we 
see that certain numbers are real but not rational. The proof of this 
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proposition also shows that the rationals Q do not obey the least upper 
bound property, otherwise one could use that property to construct a 
square root of 2, which by Proposition 4.4.4 is not possible. 

Proof. Let E be the set {y € R : y > 0 and y 2 < 2}; thus E is the set of 
all non- negative real numbers whose square is less than 2. Observe that 
E has an upper bound of 2 (because if y > 2, then y 2 > 4 > 2 and hence 
y 0 E). Also, E is non-empty (for instance, 1 is an element of E). Thus 
by the least upper bound property, we have a real number x := sup(if) 
which is the least upper bound of E. Then x is greater than or equal to 
1 (since 1 6 L) and less than or equal to 2 (since 2 is an upper bound 
for E). So x is positive. Now we show that x 2 = 2. 

We argue this by contradiction. We show that both x 2 < 2 and 
x 2 > 2 lead to contradictions. First suppose that x 2 < 2. Let 0 < e < 1 
be a small number; then we have 

(x + e) 2 = x 2 + 2ex + e 2 < x 2 + 4e + e = x 2 + 5s 

since x < 2 and e 2 < e. Since x 2 < 2, we see that we can choose an 
0 < e < 1 such that x 2 + 5e < 2, thus (,x + e) 2 < 2. By construction of 
E. this means that x + £ € E; but this contradicts the fact that x is an 
upper bound of E. 

Now suppose that x 2 > 2. Let 0 < e < 1 be a small number; then 
we have 


(x — e) 2 = x 2 — 2ex + e 2 > x 2 — 2ex > x 2 — 4e 

since x <2 and e 2 > 0. Since x 2 > 2, we can choose 0 < e < 1 such that 
x 2 — 4e > 2, and thus (x— e) 2 > 2. But then this implies that x— e > y for 
all y G E. (Why? If x — £ < y then (x — e) 2 < y 2 < 2, a contradiction.) 
Thus x — £ is an upper bound for E. which contradicts the fact that x is 
the least upper bound of E. From these two contradictions we see that 
x 2 = 2, as desired. □ 

Remark 5.5.14. In Chapter 6 we will use the least upper bound prop- 
erty to develop the theory of limits, which allows us to do many more 
things than just take square roots. 

Remark 5.5.15. We can of course talk about lower bounds, and great- 
est lower bounds, of sets E\ the greatest lower bound of a set E is also 
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known as the infimum 4 of E and is denoted inf(i?) or inf E. Everything 
we say about suprema has a counterpart for infima; we will usually leave 
such statements to the reader. A precise relationship between the two 
notions is given by Exercise 5.5.1. See also Section 6.2. 

— Exercises — 

Exercise 5.5.1. Let E be a subset of the real numbers R, and suppose that E 
has a least upper bound M which is a real number, i.e. , M = sup(A). Let —E 
be the set 

—E := {—x : x € E}. 

Show that — M is the greatest lower bound of — E , i.e., — M = inf(— E). 

Exercise 5.5.2. Let A be a non-empty subset of R, let n > 1 be an integer, and 
let L < K be integers. Suppose that K/n is an upper bound for E, but that 
L/n is not an upper bound for E. Without using Theorem 5.5.9, show that 
there exists an integer L < m < K such that m/n is an upper bound for E, but 
that (m — l)/n is not an upper bound for E. (Hint: prove by contradiction, 
and use induction. It may also help to draw a picture of the situation.) 

Exercise 5.5.3. Let A be a non-empty subset of R, let n > 1 be an integer, and 
let m, vn! be integers with the properties that m/n and m! /n are upper bounds 
for E, but (to — l)/n and (to' — 1 )/n are not upper bounds for E. Show that 
m = m! . This shows that the integer to constructed in Exercise 5.5.2 is unique. 
(Hint: again, drawing a picture will be helpful.) 

Exercise 5.5.4. Let 91 , 92 , 93 ,- be a sequence of rational numbers with the 
property that \q n — q n >\ < ^ whenever M > 1 is an integer and n,n' > M. 
Show that 91 , 92 , 93 , ... is a Cauchy sequence. Furthermore, if S := LIMn^oo q n , 
show that | 9m — S | < yj for every M > 1. (Hint: use Exercise 5.4.8.) 

Exercise 5.5.5. Establish an analogue of Proposition 5.4.14, in which “rational” 
is replaced by “irrational” . 

5.6 Real exponentiation, part I 

In Section 4.3 we defined exponentiation x n when x is rational and 
n is a natural number, or when x is a non-zero rational and n is an 
integer. Now that we have all the arithmetic operations on the reals 
(and Proposition 5.4.7 assures us that the arithmetic properties of the 

4 Supremum means “highest” and infimum means “lowest”, and the plurals are 
suprema and infima. Supremum is to superior, and infimum to inferior, as maximum 
is to major, and minimum to minor. The root words are “super”, which means 
“above”, and “infer”, which means “below” (this usage only survives in a few rare 
English words such as “infernal” , with the Latin prefix “sub” having mostly replaced 
“infer” in English). 
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rationals that we are used to, continue to hold for the reals) we can 
similarly define exponentiation of the reals. 

Definition 5.6.1 (Exponentiating a real by a natural number). Let x 
be a real number. To raise x to the power 0, we define x° := 1. Now 
suppose recursively that x n has been defined for some natural number 
n, then we define x n+1 := x n X x. 

Definition 5.6.2 (Exponentiating a real by an integer). Let x be a 
non-zero real number. Then for any negative integer — n, we define 
x~ n := l/x n . 

Clearly these definitions are consistent with the definition of rational 
exponentiation given earlier. We can then assert 

Proposition 5.6.3. All the properties in Propositions 4-3.10 and 4-3.12 
remain valid if x and y are assumed to be real numbers instead of rational 
numbers. 

Instead of giving an actual proof of this proposition, we shall give a 
meta-proof (an argument appealing to the nature of proofs, rather than 
the nature of real and rational numbers). 

Meta-proof. If one inspects the proof of Propositions 4.3.10 and 4.3.12 
we see that they rely on the laws of algebra and the laws of order for 
the rationals (Propositions 4.2.4 and 4.2.9). But by Propositions 5.3.11, 
5.4.7, and the identity xx~ l = x _1 x = 1 we know that all these laws of 
algebra and order continue to hold for real numbers as well as rationals. 
Thus we can modify the proof of Proposition 4.3.10 and 4.3.12 to hold 
in the case when x and y are real. □ 

Now we consider exponentiation to exponents which are not integers. 
We begin with the notion of an n th root, which we can define using our 
notion of supremum. 

Definition 5.6.4. Let x > 0 be a non-negative real, and let n > 1 be a 
positive integer. We define x 1 ^ 1 , also known as the n th root of x, by the 
formula 

x l ! n := sup{y G R : y > 0 and y n < x}. 

We often write \fx for x 1 / 2 . 

Note we do not define the n th root of a negative number. In fact, 
we will leave the n th roots of negative numbers undefined for the rest 
of the text (one can define these n th roots once one defines the complex 
numbers, but we shall refrain from doing so). 
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Lemma 5.6.5 (Existence of n th roots). Let x > 0 be a non-negative 
real, and let n > 1 be a positive integer. Then the set E := {y £ R : V > 
0 and y n < x} is non-empty and is also bounded above. In particidar, 
x 1 / n is a real number. 

Proof. The set E contains 0 (why?), so it is certainly not empty. Now 
we show it has an upper bound. We divide into two cases: x < 1 and 
x > 1. First suppose that we are in the case where x < 1. Then we 
claim that the set E is bounded above by 1. To see this, suppose for sake 
of contradiction that there was an element y € E for which y > 1. But 
then y n > 1 (why?), and hence y n > x, a contradiction. Thus E has an 
upper bound. Now suppose that we are in the case where x > 1. Then 
we claim that the set E is bounded above by x. To see this, suppose for 
contradiction that there was an element y € E for which y > x. Since 
x > 1, we thus have y > 1. Since y > x and y > 1, we have y n > x 
(why?), a contradiction. Thus in both cases E has an upper bound, and 
so x l ! n is finite. □ 

We list some basic properties of n th root below. 

Lemma 5.6.6. Let x,y > 0 be non-negative reals, and let n, m > 1 be 
positive integers. 

(a) If y = x 1 ^ n , then y n = x. 

( b ) Conversely, if y n = x, then y = x l / n . 

(c) x x ! n is a positive real number. 

( d ) We have x > y if and only if x x ! n > y 1 / n . 

(e) If x > 1, then x 1 ^ is a decreasing function of k. If x < 1, then 
x l / k is an increasing function of k. If x = 1, then x 1/,fc = 1 for all 
k. 

(/) We have ( xy) l / n = x 1 / 71 ^ 1 /" . 

(i g ) We have (x 1 /™) 1 /™ = x l / nm . 

Proof. See Exercise 5.6.1. □ 

The observant reader may note that this definition of x 1 /” might 
possibly be inconsistent with our previous notion of x n when n = 1, but 
it is easy to check that x 1 / 1 = x = x 1 (why?), so there is no inconsistency. 
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One consequence of Lemma 5.6.6(b) is the following cancellation law: 
if y and z are positive and y n = z n , then y = z. (Why does this follow 
from Lemma 5.6.6(b)?) Note that this only works when y and z are 
positive; for instance, (— 3) 2 = 3 2 , but we cannot conclude from this 
that —3 = 3. 

Now we define how to raise a positive number x to a rational expo- 
nent q. 

Definition 5.6.7. Let x > 0 be a positive real number, and let q be a 
rational number. To define x q , we write q = a/b for some integer a and 
positive integer 6, and define 

x q ■- ( x l/b ) a . 

Note that every rational q, whether positive, negative, or zero, can 
be written in the form a/b where a is an integer and b is positive (why?). 
However, the rational number q can be expressed in the form a/b in more 
than one way, for instance 1/2 can also be expressed as 2/4 or 3/6. So to 
ensure that this definition is well-defined, we need to check that different 
expressions a/b give the same formula for x q : 

Lemma 5.6.8. Let a, a' be integers and b,b' be positive integers such 
that a/b = a! /V , and let x be a positive real number. Then we have 
(: x l / b ') a ' = (x 1 / b ) a . 

Proof. There are three cases: a = 0, a > 0, a < 0. If a = 0, then we 
must have a' = 0 (why?) and so both ( x 1 / fe, ) a ' and {x l ^ b ) a are equal to 
1, so we are done. 

Now suppose that a > 0. Then a' > 0 (why?), and ab' = ba' . Write 
y := x 1 /( ab '' 1 = x i / {bo "> . By Lemma 5.6.6(g) we have y = (x 1 / fe, ) 1 / a 
and y = (. x 1 / b ) 1 / a j by Lemma 5.6.6(a) we thus have y a = x 1//fc and 
y a ’ _ x i/b _ Thus we have 

(: x 1/b ') a ' = (y a ) a ' = y aa ' = (/)“ = {x 1/b ) a 

as desired. 

Finally, suppose that a < 0. Then we have {—a)/b = (—a')/b. But 
—a is positive, so the previous case applies and we have (x 1//fc )~ a = 
(x 1/b )~ a . Taking the reciprocal of both sides we obtain the result. □ 

Thus x q is well-defined for every rational q. Note that this new 
definition is consistent with our old definition for x 1 / n (why?) and is 
also consistent with our old definition for x n (why?). 
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Some basic facts about rational exponentiation: 

Lemma 5.6.9. Let x,y > 0 be positive reals, and let q,r be rationals. 

(a) x q is a positive real. 

( b ) x q+r = x q x r and (x q ) r = x qr . 

(c) x~ q = l/x q . 

(d) If q > 0, then x > y if and only if x q > y q . 

(e) If x > 1, then x q > x r if and only if q > r. If x < 1, then x q > x r 
if and only if q < r. 

Proof. See Exercise 5.6.2. □ 

We still have to do real exponentiation; in other words, we still have 
to define x y where x > 0 and y is a real number - but we will defer that 
until Section 6.7, once we have formalized the concept of limit. 

In the rest of the text we shall now just assume the real numbers to 
obey all the usual laws of algebra, order, and exponentiation. 

— Exercises — 

Exercise 5.6.1. Prove Lemma 5.6.6. (Hints: review the proof of Proposition 

5.5.12. Also, you will find proof by contradiction a useful tool, especially when 
combined with the trichotomy of order in Proposition 5.4.7 and Proposition 

5.4.12. The earlier parts of the lemma can be used to prove later parts of the 
lemma. With part (e), first show that if x > 1 then x 1 / n > 1, and if x < 1 
then a: 1 /" - < 1.) 

Exercise 5.6.2. Prove Lemma 5.6.9. (Hint: you should rely mainly on Lemma 
5.6.6 and on algebra.) 

Exercise 5.6.3. If a; is a real number, show that |x| = (a: 2 ) 1 / 2 . 
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Limits of sequences 


6.1 Convergence and limit laws 

In the previous chapter, we defined the real numbers as formal limits 
of rational (Cauchy) sequences, and we then defined various operations 
on the real numbers. However, unlike our work in constructing the 
integers (where we eventually replaced formal differences with actual 
differences) and rationals (where we eventually replaced formal quotients 
with actual quotients), we never really finished the job of constructing 
the real numbers, because we never got around to replacing formal limits 
LIM n ^oo a n with actual limits lim n _ ) . 00 a n . In fact, we haven’t defined 
limits at all yet. This will now be rectified. 

We begin by repeating much of the machinery of e-close sequences, 
etc. again - but this time, we do it for sequences of real numbers, not 
rational numbers. Thus this discussion will supercede what we did in 
the previous chapter. First, we define distance for real numbers: 

Definition 6.1.1 (Distance between two real numbers). Given two real 
numbers x and y, we define their distance d(x, y ) to be d(x , y) := \x — y\. 

Clearly this definition is consistent with Definition 4.3.2. Further, 
Proposition 4.3.3 works just as well for real numbers as it does for ra- 
tionals, because the real numbers obey all the rules of algebra that the 
rationals do. 

Definition 6.1.2 (e-close real numbers). Let e > 0 be a real number. 
We say that two real numbers x, y are e-close iff we have d(y, x) < e. 

Again, it is clear that this definition of e-close is consistent with 
Definition 4.3.4. 
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Now let (a„)£L m be a sequence of real numbers; i.e., we assign a 
real number a n for every integer n > m. The starting index m is some 
integer; usually this will be 1, but in some cases we will start from some 
index other than 1. (The choice of label used to index this sequence is 
unimportant; we could use for instance (ak)^L m and this would represent 
exactly the same sequence as (a n )ff =m .) We can define the notion of a 
Cauchy sequence in the same manner as before: 

Definition 6.1.3 (Cauchy sequences of reals). Let e > 0 be a real 
number. A sequence (a n )“ =JV of real numbers starting at some integer 
index N is said to be e-steady iff aj and a k are e-close for every j. k > N. 
A sequence (a n )ff =m starting at some integer index m is said to be 
eventually e-steady iff there exists an N > m such that (a n )ff =N is 
e-steady. We say that (a n )ff =m is a Cauchy sequence iff it is eventually 
e-steady for every e > 0. 

To put it another way, a sequence (a n )ff =m of real numbers is a 
Cauchy sequence if, for every real e > 0, there exists an N > m such that 
\a n — a n ' | < e for all n, n' > N . These definitions are consistent with the 
corresponding definitions for rational numbers (Definitions 5.1.3, 5.1.6, 
5.1.8), although verifying consistency for Cauchy sequences takes a little 
bit of care: 

Proposition 6.1.4. Let (a n )ff =m be a sequence of rational numbers 
starting at some integer index m. Then (a n )ff =m is a Cauchy sequence 
in the sense of Definition 5.1.8 if and only if it is a Cauchy sequence in 
the sense of Definition 6.1.3. 

Proof. Suppose first that (a n )ff =m is a Cauchy sequence in the sense 
of Definition 6.1.3; then it is eventually e-steady for every real e > 0. 
In particular, it is eventually e-steady for every rational e > 0, which 
makes it a Cauchy sequence in the sense of Definition 5.1.8. 

Now suppose that (a n )ff =m is a Cauchy sequence in the sense of 
Definition 5.1.8; then it is eventually e-steady for every rational e > 0. 
If e > 0 is a real number, then there exists a rational e' > 0 which is 
smaller than e, by Proposition 5.4.12. Since e' is rational, we know that 
(a„)“ =m is eventually e'-steady; since e' < e, this implies that (a n )ff =m 
is eventually e-steady. Since e is an arbitrary positive real number, we 
thus see that (a n )ff =m is a Cauchy sequence in the sense of Definition 
6.1.3. □ 
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Because of this proposition, we will no longer care about the distinc- 
tion between Definition 5.1.8 and Definition 6.1.3, and view the concept 
of a Cauchy sequence as a single unified concept. 

Now we talk about what it means for a sequence of real numbers to 
converge to some limit L. 

Definition 6.1.5 (Convergence of sequences). Let £ > 0 be a real num- 
ber, and let L be a real number. A sequence {a n )ff =N of real numbers 
is said to be e-close to L iff a n is e-close to L for every n > N, i.e., we 
have | a n — L\ < e for every n > N. We say that a sequence (a n )^L m 
is eventually e-close to L iff there exists an N > m such that (a n )^ =N 
is e-close to L. We say that a sequence (a n )^L m converges to L iff it is 
eventually e-close to L for every real e > 0. 

One can unwrap all the definitions here and write the concept of 
convergence more directly; see Exercise 6.1.2. 

Examples 6.1.6. The sequence 

0.9,0.99,0.999,0.9999,... 

is 0.1-close to 1, but is not 0.01-close to 1, because of the first element of 
the sequence. However, it is eventually 0.01-close to 1. In fact, for every 
real e > 0, this sequence is eventually e-close to 1, hence is convergent 
to 1. 

Proposition 6.1.7 (Uniqueness of limits). Let (a n )ff =m be a real se- 
quence starting at some integer index m, and let L / L' be two distinct 
real numbers. Then it is not possible for (a n )^ =m to converge to L while 
also converging to L' . 

Proof. Suppose for sake of contradiction that (a n )%L m was converging 
to both L and L' . Let e = \L—L'\/3‘, note that e is positive since L ^ L' . 
Since (a n )^ =m converges to L, we know that (a n )™ =m is eventually e- 
close to L; thus there is an N > m such that d(a n , L) < e for all n > N. 
Similarly, there is an M > m such that d(a n ,L') < e for all n > M. 
In particular, if we set n := max(lV, M), then we have d(a n ,L) < e 
and d(a n ,L') < e, hence by the triangle inequality d(L,L') < 2e = 
2|L — L'l/3. But then we have | L — L’\ < 2|L — L'|/3, which contradicts 
the fact that \L — L'\ >0. Thus it is not possible to converge to both L 
and L' . □ 
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Now that we know limits are unique, we can set up notation to 
specify them: 

Definition 6.1.8 (Limits of sequences). If a sequence (a n )™ =m con- 
verges to some real number L , we say that {a n )/f =m is convergent and 
that its limit is L; we write 


L = lim a„ 

n— >oo 

to denote this fact. If a sequence (a n )/f =m is not converging to any real 
number L, we say that the sequence (a n )ff =m is divergent and we leave 
lim n _ ) . 00 a n undefined. 

Note that Proposition 6.1.7 ensures that a sequence can have at most 
one limit. Thus, if the limit exists, it is a single real number, otherwise 
it is undefined. 

Remark 6.1.9. The notation \\m n ^ f00 a n does not give any indication 
about the starting index m of the sequence, but the starting index is 
irrelevant (Exercise 6.1.3). Thus in the rest of this discussion we shall 
not be too careful as to where these sequences start, as we shall be 
mostly focused on their limits. 

We sometimes use the phrase “a n — >■ x as n — >■ oo” as an alternate 
way of writing the statement u (a n )/f =m converges to x" . Bear in mind, 
though, that the individual statements a n — > x and n — >• oo do not have 
any rigorous meaning; this phrase is just a convention, though of course 
a very suggestive one. 

Remark 6.1.10. The exact choice of letter used to denote the index 
(in this case n ) is irrelevant: the phrase lining qq a n has exactly the same 
meaning as lim/ s _ s . 0O a^, for instance. Sometimes it will be convenient to 
change the label of the index to avoid conflicts of notation; for instance, 
we might want to change n to k because n is simultaneously being used 
for some other purpose, and we want to reduce confusion. See Exercise 
6.1.4. 

As an example of a limit, we present 
Proposition 6.1.11. We have lim^^oo 1/n = 0. 

Proof. We have to show that the sequence (a n )/f =1 converges to 0, where 
a n := 1/n. In other words, for every e > 0, we need to show that the 
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sequence (a n )^ =1 is eventually e-close to 0. So, let e > 0 be an arbitrary 
real number. We have to find an N such that \a n — 0| < e for every 
n> N. But if n > N, then 


| o n — 0| = |l/n — 0| = 1/n < l/N. 

Thus, if we pick N > 1/e (which we can do by the Archimedean prin- 
ciple), then l/N < e, and so (a n )/f =N is e-close to 0. Thus (a n )“ =1 
is eventually e-close to 0. Since e was arbitrary, (a n )^_ 1 converges 
to 0. □ 

Proposition 6.1.12 (Convergent sequences are Cauchy). Suppose that 
(a n )ff =m is a convergent sequence of real numbers. Then (a n )/f =m is also 
a Cauchy sequence. 

Proof. See Exercise 6.1.5. □ 

Example 6.1.13. The sequence 1, — 1, 1, —1, 1, —1, ... is not a Cauchy 
sequence (because it is not eventually 1-steady), and is hence not a 
convergent sequence, by Proposition 6.1.12. 

Remark 6.1.14. For a converse to Proposition 6.1.12, see Theorem 
6.4.18 below. 

Now we show that formal limits can be superceded by actual limits, 
just as formal subtraction was superceded by actual subtraction when 
constructing the integers, and formal division superceded by actual di- 
vision when constructing the rational numbers. 

Proposition 6.1.15 (Formal limits are genuine limits). Suppose that 
(o n )“ =1 is a Cauchy sequence of rational numbers. Then (a n )^T 1 con- 
verges to LIM n ^. 0O a n , i.e. 

LIM, woo a n = lim a n . 

n— >■ oo 

Proof. See Exercise 6.1.6. □ 

Definition 6.1.16 (Bounded sequences). A sequence (a n )/f =m of real 
numbers is bounded by a real number M iff we have \a n \ < M for all 
n > m. We say that (a n ) ( /f =m is bounded iff it is bounded by M for some 
real number M > 0. 
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This definition is consistent with Definition 5.1.12; see Exercise 6.1.7. 
Recall from Lemma 5.1.15 that every Cauchy sequence of rational 
numbers is bounded. An inspection of the proof of that Lemma shows 
that the same argument works for real numbers; every Cauchy sequence 
of real numbers is bounded. In particular, from Proposition 6.1.12 we 
see have 

Corollary 6.1.17. Every convergent sequence of real numbers is 
bounded. 

Example 6.1.18. The sequence 1, 2, 3, 4, 5, . . . is not bounded, and 
hence is not convergent. 

We can now prove the usual limit laws. 

Theorem 6.1.19 (Limit Laws). Let {a n )ff =rn and (b n )^f =m be con- 
vergent sequences of real numbers, and let x, y be the real numbers 
x := lim n _ s . 0O a n and y := lim n ^. 00 b n . 

(a) The sequence (a n + b n ) < ff =m converges to x + y; in other words, 

lim (a n + b n ) = lim a n + lim b n . 

n— >■ oo n— > oo n— > oo 


( b ) The sequence {a v b v )ff =m converges to xy; in other words, 

lim (a n b n ) = ( lim a n )( lim b n ). 
n— >• oo n— > oo n— >■ oo 

(c) For any real number c, the sequence {ca n )^f =m converges to cx; in 
other words, 

lim ( ca n ) = c lim a n . 

n— >• oo n—> oo 

(d) The sequence (a n — b n )ff =m converges to x — y; in other words, 

lim (a n — b n ) = lim a n — lim b n . 
n— > oo n—> oo n—> oo 

(e) Suppose that y 0, and that b n / 0 for all n > m. Then the 
sequence (b~ 1 )^f =m converges to y~ l ; in other words, 

lim b~ l = ( lim 6 n ) _1 . 
n—> oo n— > oo 
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(/) Suppose that y / 0, and that b n / 0 for all n > m. Then the 
sequence (a n /b n )'^ ) =m converges to x/y; in other words, 


lim — = 

n— Kx> bn 


lim. 


■n— » oo 


lim ? - 


(5) 


The sequence (max(a„, b n ))f^ =m converges to max(x, y); in other 
words, 

lim ma x(a n ,b n ) = max( lim a n , lim b n ). 
n— >• oo n—> oo n—> oo 




T/ie sequence (min (a n ,b n ))f^ =m converges to min(x,y); in other 
words, 

lim min(a n , b n ) = min( lim a n , lim b n ). 
n— > oo n— > oo n— )■ oo 


Proof. See Exercise 6.1.8. 


□ 


— Exercises — 

Exercise 6.1.1. Let (a n )fL 0 be a sequence of real numbers, such that a n + 1 > a n 
for each natural number n. Prove that whenever n and m are natural numbers 
such that m > n, then we have a m > a n . (We refer to these sequences as 
increasing sequences.) 

Exercise 6.1.2. Let (a n )fL rn be a sequence of real numbers, and let L be a real 
number. Show that (a n )fL rn converges to L if and only if, given any real e > 0, 
one can find an N >m such that \a n — L\ < e for all n> N. 

Exercise 6.1.3. Let (a n )'fL rn be a sequence of real numbers, let c be a real 
number, and let vn! > m be an integer. Show that (a n )%L m converges to c if 
and only if (a n )^L m , converges to c. 

Exercise 6.1.4. Let (a n )fL rn be a sequence of real numbers, let c be a real 
number, and let k > 0 be a non-negative integer. Show that {a n )% L m converges 
to c if and only if (a n +k)^L m converges to c. 

Exercise 6.1.5. Prove Proposition 6.1.12. (Hint: use the triangle inequality, or 
Proposition 4.3.7.) 

Exercise 6.1.6. Prove Proposition 6.1.15, using the following outline. Let 
(a„)“_ m be a Cauchy sequence of rationals, and write L := ooa n . We 

have to show that (a n )%L m converges to L. Let £ > 0. Assume for sake of 
contradiction that sequence a n is not eventually s-close to L. Use this, and the 
fact that (a n )fL m is Cauchy, to show that there is an N > to such that either 
a n > L + £ / 2 for all n > N, or a n < L — e/2 for all n > N. Then use Exercise 
5.4.8. 

Exercise 6.1.7. Show that Definition 6.1.16 is consistent with Definition 5.1.12 
(i.e., prove an analogue of Proposition 6.1.4 for bounded sequences instead of 
Cauchy sequences). 
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Exercise 6.1.8. Prove Theorem 6.1.19. (Hint: you can use some parts of the 
theorem to prove others, e.g., (b) can be used to prove (c); (a),(c) can be used 
to prove (d); and (b), (e) can be used to prove (f). The proofs are similar to 
those of Lemma 5.3.6, Proposition 5.3.10, and Lemma 5.3.15. For (e), you may 
need to first prove the auxiliary result that any sequence whose elements are 
non-zero, and which converges to a non-zero limit, is bounded away from zero.) 

Exercise 6.1.9. Explain why Theorem 6.1.19(f) fails when the limit of the de- 
nominator is 0. (To repair that problem requires L’Hopital’s rule, see Section 
10.5.) 

Exercise 6.1.10. Show that the concept of equivalent Cauchy sequence, as de- 
fined in Definition 5.2.6, does not change if e is required to be positive real in- 
stead of positive rational. More precisely, if (a n )^L 0 and (b n )%L 0 are sequences 
of reals, show that (a„)$£ L 0 and (b n )'^L 0 are eventually e-close for every ratio- 
nal e > 0 if and only if they are eventually e-close for every real e > 0. (Hint: 
modify the proof of Proposition 6.1.4.) 

6.2 The Extended real number system 

There are some sequences which do not converge to any real number, 
but instead seem to be wanting to converge to +oo or — oo. For instance, 
it seems intuitive that the sequence 

1,2, 3, 4, 5,... 

should be converging to +oo, while 

-1,-2, -3, -4, -5,... 

should be converging to — oo. Meanwhile, the sequence 

1 , - 1 , 1 , - 1 , 1 , - 1 ,... 

does not seem to be converging to anything (although we shall see later 
that it does have +1 and —1 as “limit points” - see below). Similarly 
the sequence 

1,-2, 3, -4, 5, -6,... 

does not converge to any real number, and also does not appear to be 
converging to +oo or converging to — oo. To make this precise we need 
to talk about something called the extended real number system. 

Definition 6.2.1 (Extended real number system). The extended real 
number system R* is the real line R with two additional elements at- 
tached, called Too and — oo. These elements are distinct from each other 
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and also distinct from every real number. An extended real number x is 
called finite iff it is a real number, and infinite iff it is equal to +oo or 
— oo. (This definition is not directly related to the notion of finite and 
infinite sets in Section 3.6, though it is of course similar in spirit.) 

These new symbols, +oo and — oo, at present do not have much 
meaning, since we have no operations to manipulate them (other than 
equality = and inequality /). Now we place a few operations on the 
extended real number system. 

Definition 6.2.2 (Negation of extended reals). The operation of nega- 
tion x i->- —x on R, we now extend to R* by defining — (+oo) := — oo 
and — (— oo) := +oo. 

Thus every extended real number x has a negation, and — (— x) is 
always equal to x. 

Definition 6.2.3 (Ordering of extended reals). Let x and y be extended 
real numbers. We say that x < y, i.e., x is less than or equal to y, iff 
one of the following three statements is true: 

(a) x and y are real numbers, and x < y as real numbers. 

(b) y = +oo. 

(c) x = — oo. 

We say that x < y if we have x < y and x y. We sometimes write 
x < y as y > x, and x < y as y > x. 

Examples 6.2.4. 3 < 5, 3 < +oo, and — oo < +oo, but 3 — oo. 

Some basic properties of order and negation on the extended real 
number system: 

Proposition 6.2.5. Let x, y, z be extended real numbers. Then the 
following statements are true: 

(a) (Reflexivity) We have x < x. 

( b ) ( Trichotomy ) Exactly one of the statements x < y, x = y, or x > y 
is true. 
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(c) ( Transitivity ) If x < y and y < z, then x < z. 

( d ) ( Negation reverses order) If x <y, then —y < —x. 

Proof. See Exercise 6.2.1. □ 

One could also introduce other operations on the extended real num- 
ber system, such as addition, multiplication, etc. However, this is some- 
what dangerous as these operations will almost certainly fail to obey 
the familiar rules of algebra. For instance, to define addition it seems 
reasonable (given one’s intuitive notion of infinity) to set +oo + 5 = +oo 
and +oo + 3 = +oo, but then this implies that +oo + 5 = +oo + 3, while 
5 yl 3. So things like the cancellation law begin to break down once 
we try to operate involving infinity. To avoid these issues we shall sim- 
ply not define any arithmetic operations on the extended real number 
system other than negation and order. 

Remember that we defined the notion of supremum or least upper 
bound of a set E of reals; this gave an extended real number sup(E'), 
which was either finite or infinite. We now extend this notion slightly. 

Definition 6.2.6 (Supremum of sets of extended reals). Let E be a 
subset of R*. Then we define the supremum sup(E) or least upper 
bound of E by the following rule. 

(a) If E is contained in R (i.e., +oo and — oo are not elements of E), 
then we let sup(E) be as defined in Definition 5.5.10. 

(b) If E contains +oo, then we set sup(E’) := +oo. 

(c) If E does not contain +oo but does contain — oo, then we set 
sup(E) := sup(E\{— oo}) (which is a subset of R and thus falls 
under case (a)). 

We also define the infimum inf (E) of E (also known as the greatest 
lower bound of E by the formula 

inf(E) := — sup(— E) 

where — E is the set — E := {— x : x € E}. 

Example 6.2.7. Let E be the negative integers, together with — oo: 

E = (—1, —2, —3, —4, . . .} U {— oo}. 



136 


6. Limits of sequences 


Then sup (E) = sup(E\{— oo}) = —1, while 

inf(E) = — sup(— E) = — (+oo) = — oo. 

Example 6.2.8. The set {0.9,0.99,0.999,0.9999,...} has infimum 0.9 
and supremum 1. Note that in this case the supremum does not actually 
belong to the set, but it is in some sense “touching it” from the right. 

Example 6.2.9. The set {1, 2, 3, 4, 5 . . .} has infimum 1 and supremum 
Too. 

Example 6.2.10. Let E be the empty set. Then sup(E) = — oo and 
inf (E) = Too (why?). This is the only case in which the supremum can 
be less than the infimum (why?). 

One can intuitively think of the supremum of E as follows. Imagine 
the real line with Too somehow on the far right, and — oo on the far 
left. Imagine a piston at Too moving leftward until it is stopped by the 
presence of a set E\ the location where it stops is the supremum of E. 
Similarly if one imagines a piston at — oo moving rightward until it is 
stopped by the presence of E, the location where it stops is the infimum 
of E. In the case when E is the empty set, the pistons pass through 
each other, the supremum landing at — oo and the infimum landing at 
Too. 

The following theorem justifies the terminology “least upper bound” 
and “greatest lower bound”: 

Theorem 6.2.11. Let E be a subset o/R*. Then the following state- 
ments are true. 

(a) For every x G E we have x < sup (E) and x > inf(E). 

( b ) Suppose that M e R* is an upper bound for E, i.e., x < M for all 
x € E. Then we have sup(E) < M . 

(c) Suppose that M € R * is a lower bound for E, i.e., x > M for all 
x € E. Then we have inf(E) > M . 

Proof. See Exercise 6.2.2. □ 


— Exercises — 

Exercise 6.2.1. Prove Proposition 6.2.5. (Hint: you may need Proposition 
5.4.7.) 

Exercise 6.2.2. Prove Theorem 6.2.11. (Hint: you may need to break into 
cases depending on whether Too or — oo belongs to E. You can of course use 
Definition 5.5.10, provided that E consists only of real numbers.) 
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6.3 Suprema and Infima of sequences 

Having defined the notion of a supremum and infhnum of sets of reals, 
we can now also talk about the supremum and infimum of sequences. 

Definition 6.3.1 (Sup and inf of sequences). Let {a n )(f =m be a sequence 
of real numbers. Then we define sup(a n )£T m to be the supremum of the 
set {a n : n > m}, and inf (a n )j£L m to the infimum of the same set 
{a n : n > m}. 

Remark 6.3.2. The quantities sup(a n )^L m and inf (a n )/T m are some- 
times written as sup n>m a n and inf n > m a n respectively. 

Example 6.3.3. Let a n := (— l) n ; thus (o n )“ =1 is the sequence 

— 1, 1, — 1, 1, . . .. Then the set {a n : n > 1} is just the two-element 

set {—1, 1}, and hence sup(a n )^L 1 is equal to 1. Similarly inf(a n )^ =1 is 
equal to —1. 

Example 6.3.4. Let a n := 1/n; thus (a n )™ =1 is the sequence 

1, 1/2, 1/3, . . .. Then the set {a n : n > 1} is the countable set 

{1, 1/2, 1/3, 1/4, . . .}. Thus sup(a n )j)T 1 = 1 and inf(a n )^T 1 = 0 (Exer- 
cise 6.3.1). Notice here that the infimum of the sequence is not actually 
a member of the sequence, though it becomes very close to the sequence 
eventually. (So it is a little inaccurate to think of the supremum and 
infimum as the “largest element of the sequence” and “smallest element 
of the sequence” respectively.) 

Example 6.3.5. Let a n := n; thus (a n )“ =1 is the sequence 1, 2, 3, 4, . . .. 
Then the set {a n : n > 1} is just the positive integers {1, 2, 3, 4, . . .}. 
Then sup(a n )^ =1 = +oo and inf {a n )ff =1 = 1. 

As the last example shows, it is possible for the supremum or infimum 
of a sequence to be +oo or — oo. However, if a sequence (a n )^ =m is 
bounded, say bounded by M, then all the elements a n of the sequence 
lie between — M and M , so that the set {a n : n > m} has M as an upper 
bound and — M as a lower bound. Since this set is clearly non-empty, 
we can thus conclude that the supremum and infimum of a bounded 
sequence are real numbers (i.e., not +oo and — oo). 

Proposition 6.3.6 (Least upper bound property). Let (a n ) c ff =m be a 
sequence of real numbers, and let x be the extended real number x := 
sup(a n )j)T m . Then we have a n < x for all n > m. Also, whenever 
M e R* is an upper bound for a n (i.e., a n < M for all n > m), we 
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have x < M . Finally, for every extended real number y for which y < x, 
there exists at least one n>m for which y < a n < x. 

Proof. See Exercise 6.3.2. □ 

Remark 6.3.7. There is a corresponding Proposition for infima, but 
with all the references to order reversed, e.g., all upper bounds should 
now be lower bounds, etc. The proof is exactly the same. 

Now we give an application of these concepts of supremum and in- 
fimum. In the previous section we saw that all convergent sequences 
are bounded. It is natural to ask whether the converse is true: are 
all bounded sequences convergent? The answer is no; for instance, the 
sequence 1,-1, 1,-1,. ..is bounded, but not Cauchy and hence not con- 
vergent. However, if we make the sequence both bounded and monotone 
(i.e., increasing or decreasing), then it is true that it must converge: 

Proposition 6.3.8 (Monotone bounded sequences converge). Let 
(a n )if =rn be a sequence of real numbers which has some finite upper bound 
M G R, and which is also increasing (i.e., a n+ \ > a n for all n > m). 
Then (a n )%L m is convergent, and in fact 

lim a n = sup(a n )£L m < M. 

n— >• oo 

Proof. See Exercise 6.3.3. □ 

One can similarly prove that if a sequence (a n )ff =m is bounded below 
and decreasing (i.e., a n+ \ < a n ), then it is convergent, and that the limit 
is equal to the infimum. 

A sequence is said to be monotone if it is either increasing or de- 
creasing. From Proposition 6.3.8 and Corollary 6.1.17 we see that a 
monotone sequence converges if and only if it is bounded. 

Example 6.3.9. The sequence 3,3.1,3.14,3.141,3.1415,... is increas- 
ing, and is bounded above by 4. Hence by Proposition 6.3.8 it must have 
a limit, which is a real number less than or equal to 4. 

Proposition 6.3.8 asserts that the limit of a monotone sequence exists, 
but does not directly say what that limit is. Nevertheless, with a little 
extra work one can often find the limit once one is given that the limit 
does exist. For instance: 

Proposition 6.3.10. Let 0 < x < 1. Then we have lim ra _ ) . 00 :r n = 0. 
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Proof. Since 0 < x < 1, one can show that the sequence (x n )^ =1 is 
decreasing (why?). On the other hand, the sequence (x n )^ =1 has a lower 
bound of 0. Thus by Proposition 6.3.8 (for infhna instead of suprema) 
the sequence (x n )^ =1 converges to some limit L. Since x n+1 = xxx n , we 
thus see from the limit laws (Theorem 6.1.19) that (x n+1 )^? =1 converges 
to xL. But the sequence {x n+1 )ff =l is just the sequence (x n ) < ^ > =2 shifted 
by one, and so they must have the same limits (why?). So xL = L. 
Since x / 1, we can solve for L to obtain L = 0. Thus (x n )^f =1 converges 
to 0. □ 

Note that this proof does not work when x > 1 (Exercise 6.3.4). 

— Exercises — 

Exercise 6.3.1. Verify the claim in Example 6.3.4. 

Exercise 6.3.2. Prove Proposition 6.3.6. (Hint: use Theorem 6.2.11.) 

Exercise 6.3.3. Prove Proposition 6.3.8. (Hint: use Proposition 6.3.6, to- 
gether with the assumption that a n is increasing, to show that a n converges to 
sup(a n )~ =m .) 

Exercise 6.3.4. Explain why Proposition 6.3.10 fails when x > 1. In fact, show 
that the sequence (&’")$)?_! diverges when x > 1. (Hint: prove by contradiction 
and use the identity {l/x) n x n = 1 and the limit laws in Theorem 6.1.19.) 
Compare this with the argument in Example 1.2.3; can you now explain the 
flaws in the reasoning in that example? 

6.4 Limsup, Liminf, and limit points 

Consider the sequence 

1 . 1 , - 1 . 01 , 1 . 001 , - 1 . 0001 , 1 . 00001 , .... 

If one plots this sequence, then one sees (informally, of course) that 
this sequence does not converge; half the time the sequence is getting 
close to 1, and half the time the sequence is getting close to -1, but it is 
not converging to either of them; for instance, it never gets eventually 
1/2-close to 1, and never gets eventually 1/2-close to -1. However, even 
though -1 and +1 are not quite limits of this sequence, it does seem 
that in some vague way they “want” to be limits. To make this notion 
precise we introduce the notion of a limit point. 

Definition 6.4.1 (Limit points). Let {a n )ff =m be a sequence of real 
numbers, let x be a real number, and let e > 0 be a real number. We 
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say that x is e-adherent to (a n )ff =rn iff there exists an n> m such that 
a n is e-close to x. We say that x is continually e-adherent to (a n )£L m 
iff it is e-adherent to {a n ) ( ff =N for every N > m. We say that x is a 
limit point or adherent point of (a n )^L m iff it is continually e-adherent 
to {a n )ff =rn for every e > 0. 

Remark 6.4.2. The verb “to adhere” means much the same as “to stick 
to” ; hence the term “adhesive” . 

Unwrapping all the definitions, we see that x is a limit point of 
(a n ) ( ff =m if, for every e > 0 and every N > m, there exists an n > N 
such that | a n — x\ < e. (Why is this the same definition?) Note the 
difference between a sequence being e-close to L (which means that 
all the elements of the sequence stay within a distance e of L ) and L 
being e-adherent to the sequence (which only needs a single element 
of the sequence to stay within a distance e of L). Also, for L to be 
continually e-adherent to (a n )^f =m , it has to be e-adherent to (a„)“ =JV 
for all N > m, whereas for (a„)“ =m to be eventually e-close to L , we 
only need {a n )^ =N to be e-close to L for some N > m. Thus there are 
some subtle differences in quantifiers between limits and limit points. 

Note that limit points are only defined for finite real numbers. It 
is also possible to rigorously define the concept of +oo or — oo being a 
limit point; see Exercise 6.4.8. 

Example 6.4.3. Let (cin)nLi denote the sequence 

0.9, 0.99, 0.999, 0.9999, 0.99999, .... 

The number 0.8 is 0.1-adherent to this sequence, since 0.8 is 0.1-close to 
0.9, which is a member of this sequence. However, it is not continually 
0.1-adherent to this sequence, since once one discards the first element 
of this sequence there is no member of the sequence to be 0.1-close to. In 
particular, 0.8 is not a limit point of this sequence. On the other hand, 
the number 1 is 0.1-adherent to this sequence, and in fact is continually 
0.1-adherent to this sequence, since no matter how many initial members 
of the sequence one discards, there is still something for 1 to be 0.1-close 
to. In fact, it is continually e-adherent for every e, and is hence a limit 
point of this sequence. 

Example 6.4.4. Now consider the sequence 

1 . 1 , - 1 . 01 , 1 . 001 , - 1 . 0001 , 1 . 00001 , .... 
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The number 1 is 0.1-adherent to this sequence; in fact it is continually 
0.1-adherent to this sequence, because no matter how many elements of 
the sequence one discards, there are some elements of the sequence that 
1 is 0.1-close to. (As discussed earlier, one does not need all the elements 
to be 0.1-close to 1, just some; thus 0.1-adherent is weaker than 0.1-close, 
and continually 0.1-adherent is a different notion from eventually 0.1- 
close.) In fact, for every e > 0, the number 1 is continually e-adherent 
to this sequence, and is thus a limit point of this sequence. Similarly -1 
is a limit point of this sequence; however 0 (say) is not a limit point of 
this sequence, since it is not continually 0.1-adherent to it. 

Limits are of course a special case of limit points: 

Proposition 6.4.5 (Limits are limit points). Let (a n )j)T m be a sequence 
which converges to a real number c. Then c is a limit point of (a„)“ =m , 
and in fact it is the only limit point of (a n )‘^ =m . 

Proof. See Exercise 6.4.1. □ 

Now we will look at two special types of limit points: the limit 
superior (lim sup) and limit inferior (lim inf). 

Definition 6.4.6 (Limit superior and limit inferior). Suppose that 
{a n )ff =rn is a sequence. We define a new sequence {a~^)^ =m by the 
formula 

a N '■= sup (a n )™ =N . 

More informally, a ^ is the supremum of all the elements in the sequence 
from ajy onwards. We then define the limit superior of the sequence 
(fln)^= m , denoted limsup^go a n , by the formula 

lim sup a n := inf(a^)“ =m . 

n— >■ oo 


Similarly, we can define 


a N := inf {a n )™ =N 

and define the limit inferior of the sequence (a n )^L m , denoted 
liminfn^oo a n , by the formula 
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Example 6 . 4 . 7 . Let 01,02,03, . . . denote the sequence 

1 .1, -1.01, 1.001, -1.0001, 1.00001 , .... 

Then a+, aj, a^, . . . is the sequence 

1 . 1 , 1 . 001 , 1 . 001 , 1 . 00001 , 1 . 00001 , . . . 

(why?), and its infimum is 1. Hence the limit superior of this sequence 
is 1. Similarly, af ,af , . . . is the sequence 


- 1 . 01 , - 1 . 01 , - 1 . 0001 , - 1 . 0001 , - 1 . 000001 , . . . 


(why?), and the supremum of this sequence is —1. Hence the limit infe- 
rior of this sequence is —1. One should compare this with the supremum 
and infimum of the sequence, which are 1.1 and —1.01 respectively. 

Example 6 . 4 . 8 . Let 01,02,03, • • • denote the sequence 

1,-2, 3, -4, 5 , -6, 7 , -8,... 

Then 0^,0^,... is the sequence 

+00, +00, +00, +00, . . . 

(why?) and so the limit superior is +00. Similarly, is the 

sequence 

—00, —00, —00, —00, . . . 
and so the limit inferior is —00. 

Example 6 . 4 . 9 . Let 01,02,03, . . . denote the sequence 


1, -1/2, 1/3, -1/4, 1/5, -1/6 ,... 


Then a) 1 " , , • • • is the sequence 


1,1/3, 1/3, 1/5, 1/5, 1/7 ,... 


which has an infimum of 0 (why?), so the limit superior is 0. Similarly, 
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a l , a 2 . . is the sequence 

- 1 / 2 , - 1 / 2 , - 1 / 4 , - 1 / 4 , - 1 / 6 , - 1/6 
which has a supremum of 0 . So the limit inferior is also 0 . 
Example 6.4.10. Let 01,02,03, • . . denote the sequence 

1 , 2 , 3 , 4 , 5 , 6 ,... 

Then af, ch/ , . . . is the sequence 


+00, +00, +00, . . . 

so the limit superior is +00. Similarly, af,a f , ... is the sequence 

1 ,2, 3 , 4 , 5 ,... 

which has a supremum of +00. So the limit inferior is also +00. 

Remark 6.4.11. Some authors use the notation lim n _ 5 . 0O a n instead of 
limsup n _ ) . 00 a n , and lim ^^Qr, instead of liminf n _ 5 . 0O a n . Note that the 
starting index m of the sequence is irrelevant (see Exercise 6 . 4 . 2 ). 

Returning to the piston analogy, imagine a piston at +00 moving 

leftward until it is stopped by the presence of the sequence 01,02, 

The place it will stop is the supremum of 01,02,03,..., which in our 
new notation is af. Now let us remove the first element ai from the 
sequence; this may cause our piston to slip leftward, to a new point aj 
(though in many cases the piston will not move and will just be the 
same as a\). Then we remove the second element 02, causing the piston 
to slip a little more. If we keep doing this the piston will keep slipping, 
but there will be some point where it cannot go any further, and this is 
the limit superior of the sequence. A similar analogy can describe the 
limit inferior of the sequence. 

We now describe some basic properties of limit superior and limit 
inferior. 

Proposition 6.4.12. Let (a n )(f =m be a sequence of real numbers, let L + 
be the limit superior of this sequence, and let L~ be the limit inferior of 
this sequence (thus both L + and L~ are extended real numbers). 

(a) For every x > L + , there exists an N > m such that a n < x for 
all n > N. (In other words, for every x > L + , the elements of the 
sequence (a n ))f =m are eventually less than x.) Similarly, for every 
y < L~ there exists an N > m such that a n > y for all n> N. 
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( b ) For every x < L + , and every N > rri, there exists an n> N such 
that a n > x. (In other words, for every x < L + , the elements 
of the sequence (a r ,)ff =m exceed x infinitely often.) Similarly, for 
every y > L~ and every N > m, there exists an n > N such that 
a n < y. 

(c) We have ini '(a n )™ =rn < L~ < L + < sup(a n )£T m . 

( d ) If c is any limit point of (a n )^f =m , then we have L~ < c < L + . 

(e) If L + is finite, then it is a limit point of (a n )%L m . Similarly, if L~ 
is finite, then it is a limit point of (a n )^ =m . 

(/) Let c be a real number. If (a n )^ =m converges to c, then we must 
have L + = L~ = c. Conversely, if L + = L~ = c, then (a„)“ =m 
converges to c. 

Proof. We shall prove (a) and (b), and leave the remaining parts to the 
exercises. Suppose first that x > L + . Then by definition of L + , we 
have x > inf(a^-)“ =m . By Proposition 6.3.6, there must then exist an 
integer N > m such that x > a^. By definition of a^ r , this means that 
x > sup(a n )^ < L Ar . Thus by Proposition 6.3.6 again, we have x > a n for 
all n > N, as desired. This proves the first part of (a); the second part 
of (a) is proven similarly. 

Now we prove (b). Suppose that x < L + . Then we have x < 
inf (ajv)jv =m - If we fix any N > m, then by Proposition 6.3.6, we thus 
have x < a^. By definition of a^, this means that x < sup (a n )ff =N . By 
Proposition 6.3.6 again, there must thus exist n > N such that a n > x, 
as desired. This proves the first part of (b), the second part of (b) is 
proven similarly. 

The proofs of (c), (d), (e), (f) are left to Exercise 6.4.3. □ 

Parts (c) and (d) of Proposition 6.4.12 say, in particular, that L + 
is the largest limit point of (a n )ff =m , and L~ is the smallest limit point 
(providing that L + and L~ are finite. Proposition 6.4.12 (f) then says 
that if L + and L~ coincide (so there is only one limit point), then 
the sequence in fact converges. This gives a way to test if a sequence 
converges: compute its limit superior and limit inferior, and see if they 
are equal. 

We now give a basic comparison property of limit superior and limit 
inferior. 
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Lemma 6.4.13 (Comparison principle). Suppose that (a n )/f =m and 
(6 n )/° =m are two sequences of real numbers such that a n < b n for all 
n> m. Then we have the inequalities 

SUP {a n )n=m < SUP {bn)n=m. 
inf(a n )~ =m < inf(6 n )^ =m 
lim sup a n < lim sup b n 

n— >-oo n— >-oo 

lim inf a n < lim inf b n 

n— >-oo n— >-oo 

Proof. See Exercise 6.4.4. □ 

Corollary 6.4.14 (Squeeze test). Let {a n )ff =m , (b n )ff =m , and {c n )/fL m 
be sequences of real numbers such that 

On T b n f c n 

for all n > m. Suppose also that {a n )ff =m and (c n )'ff =rn both converge to 
the same limit L. Then (b n )%L m is also convergent to L. 

Proof. See Exercise 6.4.5. □ 

Example 6.4.15. We already know (see Proposition 6.1.11) that 
linin^ool/n = 0. By the limit laws (Theorem 6.1.19), this also im- 
plies that lim, woo 2/n = 0 and lim^oo — 2/n = 0. The squeeze test 
then shows that any sequence (b n )/f =l for which 

—2/n < b n < 2/n for all n > 1 

is convergent to 0. For instance, we can use this to show that the 
sequence (— l) n /n + 1/n 2 converges to zero, or that 2~ n converges to 
zero. Note one can use induction to show that 0 < 2~ n < 1/n for all 
n > 1. 

Remark 6.4.16. The squeeze test, combined with the limit laws and the 
principle that monotone bounded sequences always have limits, allows 
to compute a large number of limits. We give some examples in the next 
chapter. 

One commonly used consequence of the squeeze test is 

Corollary 6.4.17 (Zero test for sequences). Let (a n )/f =M be a sequence 
of real numbers. Then the limit lim, woo a n exists and is equal to zero if 
and only if the limit lim n _ > , 00 \a n \ exists and is equal to zero. 
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Proof. See Exercise 6.4.7. □ 

We close this section with the following improvement to Proposition 

6 . 1 . 12 . 

Theorem 6.4.18 (Completeness of the reals). A sequence (a n )^ =1 of 
real numbers is a Cauchy sequence if and only if it is convergent. 

Remark 6.4.19. Note that while this is very similar in spirit to Propo- 
sition 6.1.15, it is a bit more general, since Proposition 6.1.15 refers to 
Cauchy sequences of rationals instead of real numbers. 

Proof. Proposition 6.1.12 already tells us that every convergent sequence 
is Cauchy, so it suffices to show that every Cauchy sequence is conver- 
gent. 

Let (a n )-i be a Cauchy sequence. We know from Lemma 5.1.15 (or 
more precisely, from the extension of this lemma to the real numbers, 
which is proven in exactly the same fashion) that the sequence (a n )“ =1 
is bounded; by Lemma 6.4.13 (or Proposition 6.4.12(c)) this implies that 
L~ := lim inf r) _ 5 . 0O a n and L + := lirn sup^^^ a n of the sequence are both 
finite. To show that the sequence converges, it will suffice by Proposition 
6.4.12(f) to show that L~ = L + . 

Now let e > 0 be any real number. Since (a n )^L 1 is a Cauchy 
sequence, it must be eventually e-steady, so in particular there exists an 
N > 1 such that the sequence (a n )ff =N is e-steady. In particular, we 
have aj\r — e < a n < ajy + e for all n > N. By Proposition 6.3.6 (or 
Lemma 6.4.13) this implies that 

<Pv - £ < inf(a n )£Ljv < sup (a„)“ =Af < a N + e 

and hence by the definition of L~ and L + (and Proposition 6.3.6 again) 


ujv — e<L ^ L^~ f aw T e. 


Thus we have 

0 <L + - L~ < 2e. 

But this is true for all e > 0, and L + and L~ do not depend on e; 
so we must therefore have L + = L~ . (If L + > then we could set 
e := (L + — L~)/3 and obtain a contradiction.) By Proposition 6.4.12(f) 
we thus see that the sequence converges. □ 
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Remark 6.4.20. In the language of metric spaces (see Chapter B.2), 
Theorem 6.4.18 asserts that the real numbers are a complete metric 
space - that they do not contain “holes” the same way the rationals 
do. (Certainly the rationals have lots of Cauchy sequences which do not 
converge to other rationals; take for instance the sequence 1, 1.4, 1.41, 
1.414, 1.4142, . . . which converges to the irrational \/2.) This property is 
closely related to the least upper bound property (Theorem 5.5.9), and is 
one of the principal characteristics which make the real numbers superior 
to the rational numbers for the purposes of doing analysis (taking limits, 
taking derivatives and integrals, finding zeroes of functions, that kind of 
thing), as we shall see in later chapters. 

— Exercises — 

Exercise 6.4.1. Prove Proposition 6.4.5. 

Exercise 6.4.2. State and prove analogues of Exercises 6.1.3 and 6.1.4 for limit 
points, limit superior, and limit inferior. 

Exercise 6.4.3. Prove parts (c),(d),(e),(f) of Proposition 6.4.12. (Hint: you can 
use earlier parts of the proposition to prove later ones.) 

Exercise 6.4.4. Prove Lemma 6.4.13. 

Exercise 6.4.5. Use Lemma 6.4.13 to prove Corollary 6.4.14. 

Exercise 6.4.6. Give an example of two bounded sequences (a n )'^L 1 and ( b n )$)C 1 
such that a n < b n for all n > 1, but that sup^™))^ it sup(6 n ))) < L 1 . Explain 
why this does not contradict Lemma 6.4.13. 

Exercise 6.4.7. Prove Corollary 6.4.17. Is the corollary still true if we replace 
zero in the statement of this Corollary by some other number? 

Exercise 6.4.8. Let us say that a sequence (a n )^L M of real numbers has +oo 
as a limit point iff it has no finite upper bound, and that it has — oo as a 
limit point iff it has no finite lower bound. With this definition, show that 
lim sup n _ >00 a n is a limit point of (a n )(£L M , and furthermore that it is larger 
than all the other limit points of (a n )£JL. M ; in other words, the limit superior is 
the largest limit point of a sequence. Similarly, show that the limit inferior is 
the smallest limit point of a sequence. (One can use Proposition 6.4.12 in the 
course of the proof.) 

Exercise 6.4.9. Using the definition in Exercise 6.4.8, construct a sequence 
(«„)“! which has exactly three limit points, at — oo, 0, and +oo. 

Exercise 6.4.10. Let (a n )'^L N be a sequence of real numbers, and let 
be another sequence of real numbers such that each b rn is a limit point of 
{ a n)^N- Let c be a limit point of (b m )m=M- Prove that c is also a limit point 
of {a n )n= N - (In other words, limit points of limit points are themselves limit 
points of the original sequence.) 
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6.5 Some standard limits 

Armed now with the limit laws and the squeeze test, we can now compute 
a large number of limits. 

A particularly simple limit is that of the constant sequence 
c, c, c, c, . . we clearly have 


lim c = c 

n—> oo 


for any constant c (why?). 

Also, in Proposition 6.1.11, we proved that lim, woo l/n = 0. This 
now implies 

Corollary 6.5.1. We have hin^^oo l/n 1//fc = 0 for every integer k > 1. 

Proof. From Lemma 5.6.6 we know that 1/n 1/,fc is a decreasing function 
of n, while being bounded below by 0. By Proposition 6.3.8 (for de- 
creasing sequences instead of increasing sequences) we thus know that 
this sequence converges to some limit L > 0: 

L= lim l/n 1//fc . 

n— > oo 


Raising this to the k th power and using the limit laws (or more precisely, 
Theorem 6.1.19(b) and induction), we obtain 

L k = lim 1/n. 

n— >• oo 

By Proposition 6.1.11 we thus have L k = 0; but this means that L 
cannot be positive (else L k would be positive), so L = 0, and we are 
done. □ 

Some other basic limits: 

Lemma 6.5.2. Let x be a real number. Then the limit. lim n ^.oo x n exists 
and is equal to zero when |x| < 1, exists and is equal to 1 when x = 1, 
and diverges when x = — 1 or when |.x| > 1. 

Proof. See Exercise 6.5.2. □ 

Lemma 6.5.3. For any x > 0, we have limn-^ x l ^ n = 1. 

Proof. See Exercise 6.5.3. □ 
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We will derive a few more standard limits later on, once we develop 
the root and ratio tests for series and for sequences. 

— Exercises — 

Exercise 6.5.1. Show that \im. n ^ too l/n q = 0 for any rational q > 0. (Hint: 
use Corollary 6.5.1 and the limit laws, Theorem 6.1.19.) Conclude that the 
limit limn^oo n q does not exist. (Hint: argue by contradiction using Theorem 
6.1.19(e).) 

Exercise 6.5.2. Prove Lemma 6.5.2. (Hint: use Proposition 6.3.10, Exercise 
6.3.4, and the squeeze test.) 

Exercise 6.5.3. Prove Lemma 6.5.3. (Hint: you may need to treat the cases 
x > 1 and x < 1 separately. You might wish to first use Lemma 6.5.2 to prove 
the preliminary result that for every e > 0 and every real number M > 0, there 
exists an n such that M 1 / n < 1 + £.) 


6.6 Subsequences 

This chapter has been devoted to the study of sequences (a™))^ of real 
numbers, and their limits. Some sequences were convergent to a single 
limit, while others had multiple limit points. For instance, the sequence 

1 . 1 , 0 . 1 , 1 . 01 , 0 . 01 , 1 . 001 , 0 . 001 , 1 . 0001 , . . . 

has two limit points at 0 and 1 (which are incidentally also the lim inf 
and lim sup respectively), but is not actually convergent (since the lim 
sup and lim inf are not equal). However, while this sequence is not 
convergent, it does appear to contain convergent components; it seems 
to be a mixture of two convergent subsequences, namely 


1 . 1 , 1 . 01 , 1 . 001 ,... 


and 


0 . 1 , 0 . 01 , 0 . 001 ,.... 


To make this notion more precise, we need a notion of subsequence. 


Definition 6.6.1 (Subsequences). Let (a n )^L 0 and (6 n )^T 0 be sequences 
of real numbers. We say that (6 n )“ L 0 is a subsequence of (a n )^T 0 iff 
there exists a function / : N — >• N which is strictly increasing (i.e., 
/(n + 1) > f(n) for all n € N) such that 


b n = U/(n) for all n £ N. 
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Example 6.6.2. If (a n )^L 0 is a sequence, then (a 2n)%Lo is a subse- 
quence of (a n )ff = 0 , since the function / : N — >• N defined by f(n) := 2 n 
is a strictly increasing function from N to N. Note that we do not as- 
sume / to be bijective, although it is necessarily injective (why?). More 
informally, the sequence 


a 0) a 4, a 6, • • • 


is a subsequence of 

ao, ai, <22, as, « 4 , 

Example 6.6.3. The two sequences 


1 . 1 , 1 . 01 , 1 . 001 ,... 


and 


0 . 1 , 0 . 01 , 0 . 001 ,... 


mentioned earlier are both subsequences of 


1 . 1 , 0 . 1 , 1 . 01 , 0 . 01 , 1 . 001 , 1 . 0001 ,... 

The property of being a subsequence is reflexive and transitive, 
though not symmetric: 

Lemma 6.6.4. Let (a n )£T 0 , (b n )^ =0 , and (c n )^T 0 be sequences of real 
numbers. Then (a n )^T 0 is a subsequence of (a n )^T 0 - Furthermore, if 
(pn)n=o a subsequence of (a n )^T 0? and (c„)^T 0 is a subsequence of 
(K)%L 0 , then (c n )£T 0 is a subsequence of (a n )“ 0 . 

Proof. See Exercise 6.6.1. □ 

We now relate the concept of subsequences to the concept of limits 
and limit points. 

Proposition 6.6.5 (Subsequences related to limits). Let (a n )^L 0 6e a 
sequence of real numbers, and let L be a real number. Then the following 
two statements are logically equivalent ( each one implies the other): 


(a) The sequence (a n )^L 0 converges to L. 

( b ) Every subsequence of (a n )^T 0 converges to L. 
Proof. See Exercise 6.6.4. 


□ 
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Proposition 6.6.6 (Subsequences related to limit points). Let (a n )J£L 0 
be a sequence of real numbers, and let L be a real number. Then the 
following two statements are logically equivalent. 

(a) L is a limit point of (a n )£L 0 - 

( b ) There exists a subsequence o/(a n )^ =0 which converges to L. 

Proof. See Exercise 6.6.5. □ 

Remark 6.6.7. The above two propositions give a sharp contrast be- 
tween the notion of a limit, and that of a limit point. When a sequence 
has a limit L, then all subsequences also converge to L. But when a 
sequence has L as a limit point, then only some subsequences converge 
to L. 

We can now prove an important theorem in real analysis, due to 
Bernard Bolzano (1781-1848) and Karl Weierstrass (1815 1897): every 
bounded sequence has a convergent subsequence. 

Theorem 6.6.8 (Bolzano- Weierstrass theorem). Let (a n )^L 0 be a 
bounded sequence (i.e., there exists a real number M > 0 such that 
\a n \ < M for all n € N). Then there is at least one subsequence of 
(a n )^ =0 which converges. 

Proof. Let L be the limit superior of the sequence (a n )™ =0 . Since we have 
—M < a n < M for all natural numbers n, it follows from the comparison 
principle (Lemma 6.4.13) that —M < L < M. In particular, L is a real 
number (not +oo or — oo). By Proposition 6.4.12(e), L is thus a limit 
point of (a n )J£L 0 . Thus by Proposition 6.6.6, there exists a subsequence 
of (a n )^f = Q which converges (in fact, it converges to L). □ 

Note that we could as well have used the limit inferior instead of the 
limit superior in the above argument. 

Remark 6.6.9. The Bolzano- Weierstrass theorem says that if a se- 
quence is bounded, then eventually it has no choice but to converge 
in some places; it has “no room” to spread out and stop itself from 
acquiring limit points. It is not true for unbounded sequences; for in- 
stance, the sequence 1,2,3,... has no convergent subsequences whatso- 
ever (why?). In the language of topology, this means that the interval 
{x € R : —M < x < M} is compact , whereas an unbounded set such as 
the real line R is not compact. The distinction between compact sets 
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and non-compact sets will be very important in later chapters - of similar 
importance to the distinction between finite sets and infinite sets. 

— Exercises — 

Exercise 6.6.1. Prove Lemma 6.6.4. 

Exercise 6.6.2. Can you find two sequences (a n )“ L 0 and (6 n )^L. 0 which are not 
the same sequence, but such that each is a subsequence of the other? 

Exercise 6.6.3. Let (a„)^L 0 be a sequence which is not bounded. Show that 
there exists a subsequence (b n )^L 0 of (a„)^L 0 such that lim^oo 1 /b n exists 
and is equal to zero. (Hint: for each natural number j, recursively introduce 
the quantity nj := min{n € N : |a„| > j;n > rij- 1 } (omitting the condition 
n > nj- 1 when j = 0), first explaining why the set {n € N : \a n \ > j',n > ny-i} 
is non-empty. Then set bj := a nj .) 

Exercise 6.6.4. Prove Proposition 6.6.5. (Note that one of the two implications 
has a very short proof.) 

Exercise 6.6.5. Prove Proposition 6.6.6. (Hint: to show that (a) implies (b), 
define the numbers nj for each natural numbers j by the formula nj := min{n > 
nj-i : | a n — L\ < 1/j}, with the convention no := 0, explaining why the set 
{n > nj-i : \a n — L\ < 1/j} is non-empty. Then consider the sequence a nj .) 

6.7 Real exponentiation, part II 

We finally return to the topic of exponentiation of real numbers that we 
started in Section 5.6. In that section we defined x q for all rational q 
and positive real numbers x, but we have not yet defined x a when a is 
real. We now rectify this situation using limits (in a similar way as to 
how we defined all the other standard operations on the real numbers). 
First, we need a lemma: 

Lemma 6.7.1 (Continuity of exponentiation). Let x > 0, and let a be a 
real number. Let (q n )^ =1 be any sequence of rational numbers converging 
to a. Then (.x' ?n )^ =1 is also a convergent sequence. Furthermore, if 
(Qn)n=i an V other sequence of rational numbers converging to a, then 
(x 9n )^ =1 has the same limit as (x qn )^ =1 : 

lim x qn = lim x qn . 

n— >• oo 7i— >■ oo 

Proof. There are three cases: x < 1, x = 1, and x > 1. The case x = 1 
is rather easy (because then x q = 1 for all rational q). We shall just do 
the case x > 1, and leave the case x < 1 (which is very similar) to the 
reader. 
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Let us first prove that (x 9n )^L 1 converges. By Proposition 6.4.18 it 
is enough to show that (x qn ) ( // > =1 is a Cauchy sequence. 

To do this, we need to estimate the distance between x qri and x 9m ; 
let us say for the time being that q n > q m , so that x qn > x qm (since 
x > 1). We have 

d{x^ n x ^ m ) = x^ n x^ rn —— x^ 171 {x^ n 

Since (q n )//Li is a convergent sequence, it has some upper bound M; 
since x > 1, we have x qm < x M . Thus 

d(x qn , x qm ) = \x qn — x qm \ < x M (x qn ~ qrn — 1). 

Now let e > 0. We know by Lemma 6.5.3 that the sequence (x 1 ^ k ) ( ^ =1 
is eventually ex _M -close to 1. Thus there exists some K > 1 such that 

| x 1 ^ - 1| < ex~ M . 

Now since {q n )//L i is convergent, it is a Cauchy sequence, and so there 
is an IV > 1 such that q n and q m are 1/K - -close for all n,m> N. Thus 
we have 

d(x qn ,x qrn ) = x M (x qn ~ qm - 1) < x M (x 1/K - 1) < x M ex~ M = e 

for every n,m-> N such that q n > q m - By symmetry we also have this 
bound when n,m> N and q n < q rn . Thus the sequence (x qn )™ =N is e- 
steady. Thus the sequence (x 9n )^T 1 is eventually e-steady for every e > 
0, and is thus a Cauchy sequence as desired. This proves the convergence 

Of (**>)« ,. 

Now we prove the second claim. It will suffice to show that 

lim x qn ~ qn = 1, 

71— >■ OO 

since the claim would then follow from limit laws (since x qn = 
X^ n x^ n ^ 

Write r n := q n — q' n ; by limit laws we know that (r n )^L 1 converges 
to 0. We have to show that for every e > 0, the sequence (x rn )^ =1 
is eventually e-close to 1. But from Lemma 6.5.3 we know that the 
sequence (.'c 1/,fc )£L 1 is eventually e-close to 1. Since lim*,-^ x~ l / k is also 
equal to 1 by Lemma 6.5.3, we know that (: x _1//fc )^ =1 is also eventually 
e-close to 1. Thus we can find a K such that x l / K and x~ x ^ K are both e- 
close to 1. But since (r n )™ =1 is convergent to 0, it is eventually 1/K - -close 
to 0, so that eventually — 1/K < r n < 1/K , and thus x~ x ! K < x Tn < 
x 1 ^ . In particular x rn is also eventually e-close to 1 (see Proposition 
4.3.7(f)), as desired. □ 
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We may now make the following definition. 

Definition 6.7.2 (Exponentiation to a real exponent). Let x > 0 be 
real, and let a be a real number. We define the quantity x a by the 
formula x a = lim n _ ) , 00 x qn , where (q n )ff = \ is any sequence of rational 
numbers converging to a. 

Let us check that this definition is well-defined. First of all, given 
any real number a we always have at least one sequence of 

rational numbers converging to a, by the definition of real numbers 
(and Proposition 6.1.15). Secondly, given any such sequence (q n )ff=i, 
the limit linin^oo x 9 " exists by Lemma 6.7.1. Finally, even though there 
can be multiple choices for the sequence (q n )if = i-; they all give the same 
limit by Lemma 6.7.1. Thus this definition is well-defined. 

If a is not just real but rational, i.e. , a = q for some rational q, 
then this definition could in principle be inconsistent with our earlier 
definition of exponentiation in Section 6.7. But in this case a is clearly 
the limit of the sequence (q)™ =1 , so by definition x a = lim )W0O x 9 = x q . 
Thus the new definition of exponentiation is consistent with the old one. 

Proposition 6.7.3. All the results of Lemma 5.6.9, which held for ra- 
tional numbers q and r, continue to hold for real numbers q and r. 

Proof. We demonstrate this for the identity x q+r = x q x r (i.e., the first 
part of Lemma 5.6.9(b)); the other parts are similar and are left to 
Exercise 6.7.1. The idea is to start with Lemma 5.6.9 for rationals and 
then take limits to obtain the corresponding results for reals. 

Let q and r be real numbers. Then we can write q = lim n _ ) , 00 q n 
and r = lim^-^oo r n for some sequences (q n )n Li and (r n )ff =1 of rationals, 
by the definition of real numbers (and Proposition 6.1.15). Then by 
the limit laws, q + r is the limit of (q n + r n )^f = i. By definition of real 
exponentiation, we have 

x q+r = lim x qn+rn \ x q = lim x 9n ; x r = lim x r " . 
n — ^oo n—> oo n— » oo 

But by Lemma 5.6.9(b) (applied to rational exponents) we have 
x q-n+r„ = x Qn x rn_ Thus by limit laws we have x q+r = x q x r , as de- 
sired. □ 


— Exercises — 


Exercise 6.7.1. Prove the remaining components of Proposition 6.7.3. 
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Now that we have developed a reasonable theory of limits of sequences, 
we will use that theory to develop a theory of infinite series 


El On — 0"m + a m + 1 + a m + 2 + • • • 


But before we develop infinite series, we must first develop the theory 
of finite series. 


7.1 Finite series 


Definition 7.1.1 (Finite series). Let rn. n be integers, and let (a+ =m 
be a finite sequence of real numbers, assigning a real number ai to each 
integer i between m and n inclusive (i.e., m < i < n). Then we define 
the finite sum (or finite series) ai by the recursive formula 


n 

x> 

i=m 

n+l 

E a * 

7=771 


0 whenever n < m; 



+ a n+ i whenever n > m — 1. 


Thus for instance we have the identities 

771— 2 771—1 771 

^ ^ $2 — 0; ^ ^ — O5 ^ ^ 0-2 — &rn] 


771+1 

EE 

7=771 


— &771 H - ^771+1 5 


771+2 

E a * = ° m + flm+l + Orn+2 

7—771 
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(why?). Because of this, we sometimes express )C”= m a i less formally as 

n 

^ ' CLi = Clm T ®m+ 1 "h • • • T On- 

i=m 

Remark 7.1.2. The difference between “sum” and “series” is a subtle 
linguistic one. Strictly speaking, a series is an expression of the form 
Y^i= m a b this series is mathematically (but not semantically) equal to a 
real number, which is then the sum of that series. For instance, 1 + 2 + 
3 + 4 + 5 is a series, whose sum is 15; if one were to be very picky about 
semantics, one would not consider 15 a series and one would not consider 
1 + 2 + 3 + 4 + 5 a sum, despite the two expressions having the same 
value. However, we will not be very careful about this distinction as it is 
purely linguistic and has no bearing on the mathematics; the expressions 
1 + 2 + 3 + 4 + 5 and 15 are the same number, and thus mathematically 
interchangeable, in the sense of the axiom of substitution (see Section 
A. 7), even if they are not semantically interchangeable. 

Remark 7.1.3. Note that the variable i (sometimes called the index of 
summation ) is a bound variable (sometimes called a dummy variable ); 
the expression Y17=m a * does not actually depend on any quantity named 
i. In particular, one can replace the index of summation i with any other 
symbol, and obtain the same sum: 

n n 

22 ai = 22 a j ■ 

i=m j=m 

We list some basic properties of summation below. 

Lemma 7.1.4. 

(a) Let m < n < p be integers, and let ai be a real number assigned to 
each integer m < i < p. Then we have 

n p p 

^ ^ o t T ^ ' ai — ^ ^ (I*. 
i=m i=n -\- 1 i=m 

( b ) Let m < n be integers, k be another integer, and let ai be a real 
number assigned to each integer m < i <n. Then we have 

n n+k 

y! a * = 22 a j~ k - 

i=m j=m-\-k 
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(c) Let m < n be integers, and let , b t be real numbers assigned to 
each integer m < i < n. Then we have 

^2(ai + bi)= ( ai ) + ( hi 

i—m \i—m / \i=m 



(d) Let rn < n be integers, and let a* be a real number assigned to each 
integer m < i < n, and let c be another real number. Then we 
have 

n / n 

^2(cai) = c ( 

i=m \i=m 

(e) (Triangle inequality for finite series) Let m < n be integers, and 
let ai be a real number assigned to each integer m < i < n. Then 
we have 

n 

i—m 


n 

i=m 



(/) (Comparison test for finite series) Let m < n be integers, and let 
ai, bi be real numbers assigned to each integer m < i < n. Suppose 
that ai < bi for all m < i < n. Then we have 

n n 

^2 a * - ^2 

i=m i=m 


Proof. See Exercise 7.1.1. □ 

Remark 7.1.5. In the future we may omit some of the parentheses 
in series expressions, for instance we may write XliLtnl 0 * + &i) simply 
as Y^i=m a i + This is reasonably safe from being mis-interpreted, 
because the alternative interpretation (Y)i= m <Lj ) + bi does not make any 
sense (the index i in bi is meaningless outside of the summation, since i 
is only a dummy variable). 

One can use finite series to also define summations over finite sets: 

Definition 7.1.6 (Summations over finite sets). Let X be a finite set 
with n elements (where n € N), and let / : X — >• R be a function from X 
to the real numbers (i.e., / assigns a real number f{x) to each element x 
of X). Then we can define the finite sum Ylxex f( x ) as follows. We first 
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select any bijection g from {i € N : 1 < i < n} to X; such a bijection 
exists since X is assumed to have n elements. We then define 

n 

Y /( x ) : =Y^ 9 ^- 

xEX i = 1 

Example 7.1.7. Let X be the three-element set X := {a,b,c}, where 
a , b, c are distinct objects, and let / : X — > R be the function /(a) := 2, 
f(b) := 5, /(c) := — 1. In order to compute the sum YIxgX f( x )’ we 
select a bijection g : {1,2,3} — >• X , e.g., p(l) := a, g( 2) := 6, ry ( 3 ) := c. 
We then have 

X] /( x ) = /(£(*)) = /(°) + + /( c ) = 6 - 

xGX i— 1 

One could pick another bijection from {1,2,3} to X, e.g., h(l) := c, 
h(2) := b , /r(3) = c, but the end result is still the same: 

Y = Y /(M*)) = /( c ) + /( 6 ) + /(«) = 6. 

i = 1 

To verify that this definition actually does give a single, well-defined 
value to Ylxex f( x )i one h as to check that different bijections g from 
{i € N : 1 < i < n} to T give the same sum. In other words, we must 
prove 

Proposition 7.1.8 (Finite summations are well-defined). Let X be a 
finite set with n elements ( where n G N), let f : X — >• R be a function, 
and let g : {i € N : 1 < i < n} — >• X and (i:{ieN:l<i<n}-)I 
be bijections. Then we have 

n n 

Yf^ty = YfW))- 

i — 1 i= 1 

Remark 7.1.9. The issue is somewhat more complicated when sum- 
ming over infinite sets; see Section 8.2. 

Proof. We use induction on n; more precisely, we let P(n) be the as- 
sertion that “For any set X of n elements, any function / : X — > R, 
and any two bijections g , h from {i € N : 1 < i < n} to X, we have 
/(s(*)) = Xi=i f (hfi)) v ■ (More informally, P(n) is the assertion 
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that Proposition 7.1.8 is true for that value of n.) We want to prove 
that P(n) is true for all natural numbers n. 

We first check the base case P{ 0). In this case /G +)) an d 

Yi=i both equal to 0, by definition of finite series, so we are 

done. 

Now suppose inductively that P(n) is true; we now prove that P{n + 
1) is true. Thus, let X be a set with n + 1 elements, let / : X — >• R be a 
function, and let g and h be bijections from {i € N : 1 < i < n + 1} to 
X. We have to prove that 

n+ 1 n+1 

Y /($(*)) = Y /(M*))- C 7 - 1 ) 

i— 1 i = 1 


Let x := g{n + 1); thus x is an element of X. By definition of finite 
series, we can expand the left-hand side of (7.1) as 


n+1 / n \ 

Y /(»(*)) = Y ) + f( x )- 

7=1 V 7=1 / 

Now let us look at the right-hand side of (7.1). Ideally we would like to 
have h(n + 1) also equal to x - this would allow us to use the inductive 
hypothesis P(n) much more easily - but we cannot assume this. How- 
ever, since h is a bijection, we do know that there is some index j , with 
1 < j < n + 1, for which h(j) = x. We now use Lemma 7.1.4 and the 
definition of finite series to write 


71 + 1 


J 


77+1 


Y i( /z ( 7 )) 


i= 1 


Yf( h ^ ) + H 

V 7— 1 / \i=i+l 

O'- 1 \ / 77+1 

YfW)) ) + Y /(M*)) 

k 7= 1 / \*=i + l 

0-1 \ / 77 

Y /(M*)) + /(*) + J] + !)) 


,7=1 


7=1 


We now define the function /i : {i £ N : 1 < i < n} + I - {x} by 
setting h(i) := L(z) when i < j and h(i) := h(i + 1) when i > j. We can 
thus write the right-hand side of (7.1) as 


O-i 


Yf&(i)) +/(*)+ \Yf&(i)) =[Yfft(i)) +f(x) 


,7 = 1 


* =3 


K 1=1 
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where we have used Lemma 7.1.4 once again. Thus to finish the proof 
of (7.1) we have to show that 

n n 

Y /o?w) = Y /$(*))• ( 7 - 2 ) 

i — 1 7=1 

But the function g (when restricted to {i € N : 1 < i < n}) is a bijection 
from {jgN:l<!<n}->J - {x} (why?). The function /<, is also 
a bijection from {i € N : 1 < i < n} — >• X — {x} (why? cf. Lemma 
3.6.9). Since X — {x} has n elements (by Lemma 3.6.9), the claim 7.2 
then follows directly from the induction hypothesis P(n). □ 

Remark 7.1.10. Suppose that X is a set, that P(x) is a property 
pertaining to an element x of X , and / : {y £ X : P(y) is true} — >• R is 
a function. Then we will often abbreviate 

Y /( x ) 

xe{ye,\:P(y) is true} 

as ^2 X £x-.p(x) is true f( x ) or evcn as ^2p( x j is true 1 ( x ) w ^ en there is no 
chance of confusion. For instance, XmeN- 2 <n <4 f( x ) or Y^ 2 <n< 4 : f( x ) * s 
short-hand for E ne { 2 , 3 , 4 } f ( x ) = /( 2 ) + /(3) + /(4). 

The following properties of summation on finite sets are fairly obvi- 
ous, but do require a rigorous proof: 

Proposition 7.1.11 (Basic properties of summation over finite sets). 

(a) If X is empty, and f : X — >• R is a function (i.e., f is the empty 
function), we have 

Y /(*) = o. 

xex 

( b ) If X consists of a single element, X = {xo}, and f : X — >• R is a 
function, we have 

Y /( x ) = K x o)- 

xex 

(c) ( Substitution , part I) If X is a finite set, f : X — >• R is a function, 
and g : Y — >• X is a bijection, then 

Y S /fafo))- 

zS.Y y&Y 
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(d) ( Substitution , part II) Let n < m be integers, and let X be the set 
X := {* € Z : n < i < m}. If a % is a real number assigned to each 
integer i € X, then we have 

m 

= a i- 

i=n i£X 

(e) Let X , Y be disjoint finite sets (so XC\Y 
is a function. Then we have 

J2 /(*) = + 

zexuY \xex / 

(/) ( Linearity , part I) Let X be a finite set, and let f : X R and 
g : X — >• R be functions. Then 

(/( x ) + 9(x)) = E fix) + 9{x). 

xex x&x xex 

(g) ( Linearity , part II) Let X be a finite set, let f : X — >• R be a 
function, and let c be a real number. Then 

Y c f(x ) = cYf(x). 

x£X x&X 

(h) ( Monotonicity ) Let X be a finite set, and let f : X R and 
g : X — »• R be functions such that f(x) < g(x) for all x € X. 
Then we have 

Y 9 

x&X x&X 

(i) ( Triangle inequality) Let X be a finite set, and let f : X — >• R be 
a function, then 

i Y 

xex x&x 

Proof. See Exercise 7.1.2. □ 


= 0), and f : X UY — >• R 


S f(y) ) • 

cv&Y 


Remark 7.1.12. The substitution rule in Proposition 7.1.11(c) can be 
thought of as making the substitution x := g(y) (hence the name). Note 
that the assumption that g is a bijection is essential; can you see why 
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the rule will fail when g is not one-to-one or not onto? From Proposition 
7.1.11(c) and (d) we see that 

m m 

E a * = E a /W 

for any bijection / from the set {i € Z : n < i < m} to itself. Informally, 
this means that we can rearrange the elements of a finite sequence at 
will and still obtain the same value. 

Now we look at double finite series - finite series of finite series - and 
how they connect with Cartesian products. 

Lemma 7.1.13. Let X , Y be finite sets, and let f : X x Y — > R be a 
function. Then 

E = E f&v)- 

xGX \yGY J (x,y)eXxY 

Proof. Let n be the number of elements in X. We will use induction on 
n (cf. Proposition 7.1.8); i.e. , we let P(n) be the assertion that Lemma 
7.1.13 is true for any set X with n elements, and any finite set Y and 
any function / : X x Y — >• R. We wish to prove P(n) for all natural 
numbers n. 

The base case P( 0) is easy, following from Proposition 7.1.11(a) 
(why?). Now suppose that P(n) is true; we now show that P{n + 1) 
is true. Let X be a set with n + 1 elements. In particular, by Lemma 
3.6.9, we can write X = X' U {.To}, where To is an element of X and 
X' := X — {tq} has n elements. Then by Proposition 7.1.11(e) we have 


E E = ( E E + E 

xex \yeY J x&X' \y&Y ) \y&Y 

by the induction hypothesis this is equal to 

f{x,y)+ j^/(T 0 ,y) 

[x,y)eX'xY \yeY 

By Proposition 7.1.11(c) this is equal to 


E f( x ^y)+[ E /( x >y) 

(x,y)eX'xY \(a:,2/)e{a:o}x Y 
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By Proposition 7.1.11(e) this is equal to 

Y /( x >y) 

{x,y)eXxY 


(why?) as desired. □ 

Corollary 7.1.14 (Fubini’s theorem for hnite series). Let X , Y be finite 
sets, and let f : X x Y — >• R be a function. Then 


Y \Y^ x ^ 

xex \ y eY 


Y f( x >y) 

{x,y)eXxY 


= Y f( x ' y ^ 

{y,x)eYxX 

= Y [y^ x ^ 

yeY \xex 

Proof. In light of Lemma 7.1.13, it suffices to show that 



Y f( x >y)= Y f( x ’y)- 

[x,y)eXxY (y,x)eYxX 


But this follows from Proposition 7.1.11(c) by applying the bijection 
h : X xY Y x X defined by h(x, y) := (■ y , x). (Why is this a bijection, 
and why does Proposition 7.1.11(c) give us what we want?) □ 

Remark 7.1.15. This should be contrasted with Example 1.2.5; thus 
we anticipate something interesting to happen when we move from finite 
sums to infinite sums. However, see Theorem 8.2.2. 

— Exercises — 

Exercise 7.1.1. Prove Lemma 7.1.4. (Hint: you will need to use induction, but 
the base case might not necessarily be at 0.) 

Exercise 7.1.2. Prove Proposition 7.1.11. (Hint: this is not as lengthy as it 
may first appear. It is largely a matter of choosing the right bijections to turn 
these sums over sets into finite series, and then applying Lemma 7.1.4.) 

Exercise 7.1.3. Form a definition for the finite products n"=i a i an( l 
DLex /( x )- Which of the above results for finite series have analogues for 
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finite products? (Note that it is dangerous to apply logarithms because some 
of the di or f(x) could be zero or negative. Besides, we haven’t defined loga- 
rithms yet.) 

Exercise 7.1.4. Define the factorial function n\ for natural numbers n by the 
recursive definition 0! := 1 and (n + 1)! := n! x (n + 1). If x and y are real 
numbers, prove the binomial formula 


{x + yr = ^ W ^ y - x ‘ yn ~ 1 


for all natural numbers n. (Hint: induct on n.) 

Exercise 7.1.5. Let X be a finite set, let m be an integer, and for each x € X let 
(a n (x))£L m be a convergent sequence of real numbers. Show that the sequence 
(E x ex a n( x ))%L m is convergent, and 


lim ) a n (x) = } lim a n (x). 

n—> oo ' ^ ^ n — >oo 

x€X X(zX 


(Hint: induct on the cardinality of X, and use Theorem 6.1.19(a).) Thus we 
may always interchange finite sums with convergent limits. Things however get 
trickier with infinite sums; see Exercise 11.47.11. 


7.2 Infinite series 

We are now ready to sum infinite series. 

Definition 7.2.1 (Formal infinite series). A (formal) infinite series is 
any expression of the form 

OO 

y ] 0"ni 

n=m 

where m is an integer, and a n is a real number for any integer n > m. 
We sometimes write this series as 

+ ®m+ 1 + Om+2 + • ■ • • 

At present, this series is only defined formally, we have not set this 
sum equal to any real number; the notation a m + a m+ i + a m+ 2 + ... is of 
course designed to look very suggestively like a sum, but is not actually 
a finite sum because of the “. . .” symbol. To rigorously define what the 
series actually sums to, we need another definition. 
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Definition 7.2.2 (Convergence of series). Let Y^= m a n be a formal 
infinite series. For any integer N > m , we define the N th partial sum 
S'tv of this series to be Sjy := Yln=m of course, SV is a real number. 
If the sequence (S)v)jv=m converges to some limit L as N — >• oo, then 
we say that the infinite series a n is convergent , and converges 

to L; we also write L = Y^= m a m an( i say that L is the sum of the 
infinite series Y^=m a n- h the partial sums Sn diverge, then we say 
that the infinite series Y^=m an divergent, and we do not assign any 
real number value to that series. 


Remark 7.2.3. Note that Proposition 6.1.7 shows that if a series con- 
verges, then it has a unique sum, so it is safe to talk about the sum 
L = Y^=m °f a convergent series. 

Examples 7.2.4. Consider the formal infinite series 

OO 

2 ~ n = 2~ 1 + 2~ 2 + 2~ 3 + . . . . 

n= 1 

The partial sums can be verified to equal 

N 

S n = Y 2" n = 1 - 2~ N 

n = 1 


by an easy induction argument (or by Lemma 7.3.3 below); the sequence 
1 — 2~ n converges to 1 as JV -> oo, and hence we have 


OO 

E 2_ ” 

77— 1 


= 1. 


In particular, this series is convergent. On the other hand, if we consider 
the series 

OO 

Y 2 n = 2 1 + 2 2 + 2 3 + . . . 

77—1 

then the partial sums are 


N 

s N = y 2 ” = 2N+1 - 2 

77—1 

and this is easily shown to be an unbounded sequence, and hence diver- 
gent. Thus the series 2 ” is divergent. 
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Now we address the question of when a series converges. The fol- 
lowing proposition shows that a series converges iff the “tail” of the 
sequence is eventually less than e for any e > 0: 

Proposition 7.2.5. Let Yff^=m a n- b e a formal series of real numbers. 
Then a n converges if and only if, for every real number e > 0, 

there exists an integer N >m such that 


X/ a n 

n=p 


< e for all p,q > N. 


Proof. See Exercise 7.2.2. 


□ 


This Proposition, by itself, is not very handy, because it is not so 
easy to compute the partial sums Yfn=p an ' n practice. However, it has 
a number of useful corollaries. For instance: 


Corollary 7.2.6 (Zero test). Let 'Yfm=m a n be a convergent series of 
real numbers. Then we must have lim n _ ) . 00 a n = 0. To put this another 
way, if lim, woo a n is non-zero or divergent, then the series °« 

divergent. 

Proof. See Exercise 7.2.3. □ 


Example 7.2.7. The sequence a n := 1 does not converge to 0 as n — >• 
oo, so we know that 1 is a divergent series. (Note however that 

1,1, 1,1,... is a convergent sequence ; convergence of series is a different 
notion from convergence of sequences.) Similarly, the sequence a n : = 
(— l) n diverges, and in particular does not converge to zero; thus the 
series X^=i( — l) n is also divergent. 

If a sequence (« n )nL m does converge to zero, then the series Yffn=m a ” 
may or may not be convergent; it depends on the series. For instance, 
we will soon see that the series Yin Li l/?i is divergent despite the fact 
that 1/n converges to 0 as n — >• oo. 

Definition 7.2.8 (Absolute convergence). Let b e a formal 

series of real numbers. We say that this series is absolutely convergent 
iff the series \ a n\ i s convergent. 

In order to distinguish convergence from absolute convergence, we 
sometimes refer to the former as conditional convergence. 
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Proposition 7.2.9 (Absolute convergence test) . Let X)^L m a n be a for- 
mal series of real numbers. If this series is absolutely convergent, then it 
is also conditionally convergent. Furthermore, in this case we have the 
triangle inequality 

OO 

n=m 


oo 

< x^ i° n 

n=m 


Proof. See Exercise 7.2.4. 


□ 


Remark 7.2.10. The converse to this proposition is not true; there exist 
series which are conditionally convergent but not absolutely convergent. 
See Example 7.2.13. 

Remark 7.2.11. We consider the class of conditionally convergent se- 
ries to include the class of absolutely convergent series as a subclass. 
Thus when we say a statement such as “X^m an conditionally conver- 
gent” , this does not automatically mean that X)JXm a n is not absolutely 
convergent. If we wish to say that a series is conditionally convergent 
but not absolutely convergent, then we will instead use a phrasing such 
as “X^m on ^y conditionally convergent” , or “X)^L m a n converges 
conditionally, but not absolutely”. 

Proposition 7.2.12 (Alternating series test). Let (a n )ff =m be a se- 
quence of real numbers which are non-negative and decreasing, thus 
a n > 0 and a n > a n+ \ for every n > m. Then the series X)^L m (~l)” a n 
is convergent if and only if the sequence a n converges to 0 as n — >• oo. 

Proof. From the zero test, we know that if Yl^=m(~^) na n i s a convergent 
series, then the sequence (— l) n a n converges to 0, which implies that a n 
also converges to 0, since (— l) n a n and a n have the same distance from 
0. 

Now suppose conversely that a n converges to 0. For each N, let Sn 
be the partial sum Sn := Yln=m(~ l) na rb °ur job is to show that Sn 
converges. Observe that 

Sn +2 = Sn + (— l) N+ 1 aN+i + (— l) 7V+2 aAr + 2 
= Sn + (— l) Ar+1 (aAr + i — aAr +2 ). 


But by hypothesis, (ajv+i— aN+ 2 ) is non-negative. Thus we have S)v+2 — 
Sn when N is odd and Sn +2 < Sn if N is even. 

Now suppose that N is even. From the above discussion and in- 
duction we see that SV+2fc — Sn for all natural numbers k (why?). 
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Also we have SV+ 2 fc+i > Sjv+i = Sn — ajv+i (why?). Finally, we have 
<SW+ 2 fc+i = SW+ 2 fc - a N+2 k+i < S N+2 k (why?). Thus we have 

Sn — flJV+l < Spf + 2k+l < <SjV+2fc < Sn 

for all k. In particular, we have 

Sn — ciN+i < S n < Sn for all n > N 

(why?). In particular, the sequence S n is eventually aw+i-steady. But 
the sequence ajy converges to 0 as N — >• oo, thus this implies that S n is 
eventually e-steady for every e > 0 (why?). Thus S n converges, and so 
the series 1)™°™ * s convergent. □ 

Example 7.2.13. The sequence (l/n)^L 1 is non-negative, decreasing, 
and converges to zero. Thus / n * s convergent (but it is not 

absolutely convergent, because 1 jn diverges, see Corollary 7.3.7). 

Thus lack of absolute convergence does not imply lack of conditional 
convergence, even though absolute convergence implies conditional con- 
vergence. 

Some basic identities concerning convergent series are collected be- 
low. 

Proposition 7.2.14 (Series laws). 

( a ) V Y^=m a n a series of real numbers converging to x, and 
Y^=m bn is a series of real numbers converging to y, then 
Y^=m( a n + bn) is also a convergent series, and converges to x + y. 
In particular, we have 

OO OO OO 

'y ^ ( v-n T b n ) = y ^ a n + y ' b n . 

n=m n=m n=m 

( b ) If Y^=m a n a series of real numbers converging to x, and c is 
a real number, then n ( ca n) a ^ so a convergent series, and 
converges to cx. In particular, we have 

OO OO 

y ^ (cdj i) — c y ^ ci n . 


n=m 


n=m 
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(c) Let ^“ m a„ be a series of real numbers, and let k > 0 be an 
integer. If one of the two series a n and YlnLm+k ° n are 

convergent, then the other one is also, and we have the identity 


OO 

y ^ a n 

n=m 


m+k— 1 oo 

y ^ + y ^ a n . 

n=m n=m+k 


( d ) Let J2 < yLm a n be a series of real numbers converging to x, and let 
k be an integer. Then Y^=m+k a n-k a ^ so converges to x. 


Proof. See Exercise 7.2.5. 


□ 


From Proposition 7.2.14(c) we see that the convergence of a series 
does not depend on the first few elements of the series (though of course 
those elements do influence which value the series converges to) . Because 
of this, we will usually not pay much attention as to what the initial 
index m of the series is. 

There is one type of series, called telescoping series , which are easy 
to sum: 

Lemma 7.2.15 (Telescoping series). Let (a,,,)^^ be a sequence of real 
numbers which converge to 0, i.e., lim, woo a n = 0. Then the series 
(°n - a n+ 1) converges to a 0 . 

Proof. See Exercise 7.2.6. □ 


— Exercises — 

Exercise 7.2.1. Is the series convergent or divergent? Justify your 

answer. Can you now resolve the difficulty in Example 1.2.2? 

Exercise 7.2.2. Prove Proposition 7.2.5. (Hint: use Proposition 6.1.12 and 
Theorem 6.4.18.) 

Exercise 7.2.3. Use Proposition 7.2.5 to prove Corollary 7.2.6. 

Exercise 7.2.4. Prove Proposition 7.2.9. (Hint: use Proposition 7.2.5 and 
Proposition 7.1.4(e).) 

Exercise 7.2.5. Prove Proposition 7.2.14. (Hint: use Theorem 6.1.19.) 
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Exercise 7.2.6. Prove Lemma 7.2.15. (Hint: First work out what the partial 
sums Y^n=o( a ™ ~ a n+i) should be, and prove your assertion using induction.) 
How does the proposition change if we assume that a n does not converge to 
zero, but instead converges to some other real number L? 

7.3 Sums of non-negative numbers 

Now we specialize the preceding discussion in order to consider sums 
m a n where all the terms a n are non-negative. This situation comes 
up, for instance, from the absolute convergence test, since the absolute 
value \a n \ of a real number a n is always non- negative. Note that when 
all the terms in a series are non-negative, there is no distinction between 
conditional convergence and absolute convergence. 

Suppose YlnLrn a n is a series of non-negative numbers. Then the 
partial sums Sjy := 'Yh n=m a n are increasing, i.e., 5/v+i > Sn for all 
N > m (why?). From Proposition 6.3.8 and Corollary 6.1.17, we thus 
see that the sequence (SV)^L m is convergent if and only if it has an 
upper bound M. In other words, we have just shown 

Proposition 7.3.1. Let Yl^= m a n b e a f orma l series of non-negative 
real numbers. Then this series is convergent if and only if there is a real 
number M such that 

N 

a n < M for all integers N > m. 

n=m 

A simple corollary of this is 

Corollary 7.3.2 (Comparison test). LetY^ =m a n an dYl^=m^n be two 
formal series of real numbers, and suppose that \a n \ < b n for all n > m. 
Then if Y^=m convergent, then Y^= m a n absolutely convergent, 
and in fact 

oo oo oo 

y ' a n * y ' i On i ^ y ' b n . 

n=m n=m n=m 

Proof. See Exercise 7.3.1. □ 

We can also run the comparison test in the contrapositive: if we 
have |a n | < b n for all n > m, and m a n is n °t absolutely convergent, 
then YlnL-rn bn is not conditionally convergent. (Why does this follow 
immediately from Corollary 7.3.2?) 
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A useful series to use in the comparison test is the geometric series 

OO 

E 1 ”- 

n= 0 


where x is some real number: 

Lemma 7.3.3 (Geometric series). Let x be a real number. If \x\ > 1, 
then the series Y^=o xU divergent. If however |x| < 1, then the series 
is absolutely convergent and 

OO 

5>" = l/(l-s). 

n = 0 


Proof. See Exercise 7.3.2. □ 

We now give a useful criterion, known as the Cauchy criterion , to 
test whether a series of non-negative but decreasing terms is convergent. 

Proposition 7.3.4 (Cauchy criterion). Let (a n )^L 1 be a decreasing se- 
quence of non-negative real numbers (so a n > 0 and a n+ \ < a n for all 
n > 1). Then the series a « convergent if and only if the series 

OO 

^ ' 2 a<^k — a\ T 2a 2 -t- 4u4 -t- 8ug + . . . 

k = 0 


is convergent. 

Remark 7.3.5. An interesting feature of this criterion is that it only 
uses a small number of elements of the sequence a n (namely, those el- 
ements whose index n is a power of 2, n = 2 k ) in order to determine 
whether the whole series is convergent or not. 

Proof. Let Sjy := JJn=i Gn be the partial sums of a n, and let 

Tk '■= J2k = o be the partial sums of Y^'kLo 2 fcfl 2 fc. In light of Propo- 

sition 7.3.1, our task is to show that the sequence (Sn)k=i bounded 
if and only if the sequence (Tk)k=o is bounded. To do this we need the 
following claim: 

Lemma 7.3.6. For any natural number K, we have S 2 k+i_ 1 < Tk < 
2S 2 k. 
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Proof. We use induction on K. First we prove the claim when K = 0, 


i.e. 

Si < T 0 < 2 Si. 

This becomes 

ai < ci\ < 2a\ 

which is clearly true, 
Now suppose the 
prove it for K + 1: 

since ai is non-negative. 

claim has been proven for K, and now we try to 


S < 2K+2_i < Tk+i S 2S 2 k+i. 

Clearly we have 

Tr+ 1 = Tr + 2 A+1 a 2 x+i. 

Also, we have (using Lemma 7.1.4(a) and (f), and the hypothesis that 
the a n are decreasing) 


5. 


2 k + 1 


2 k + 1 2 k+1 

S^k ^ ( CL n P S 2 K T ^ ' 0*2^+! = S 2 K T 2 U2^+l 

n=2 if + 1 n=2 K +l 


and hence 

2S 2 k+i > 2S' 2 k + 2 A+1 a 2 if+i . 

Similarly we have 


2 k + 2 -1 

<S , 2 -k+2_ 1 = S 2 K+ 1_ 1 + a n 

n=2 K + 1 

2 k + 2 -1 

< ^-K+l — l + ^ ^ 0-2^+! 

n=2 if + 1 

= S 2 k+i_i + 2^ +1 a2if+i • 

Combining these inequalities with the induction hypothesis 

5 < 2 K + 1 — 1 — Tk < 2S2K 


we obtain 


*S < 2 if + 2 — 1 ^ 7/v+i < 25 2 x+i 


as desired. This proves the claim. 


□ 
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From this claim we see that if is bounded, then (S 2 k )^ = 0 

is bounded, and hence {Tk)k=o I s bounded. Conversely, if (Tk)k=o I s 
bounded, then the claim implies that S 2 k+ i_i is bounded, i.e., there is 
an M such that S' 2 x+i_ 1 < M for all natural numbers K. But one can 
easily show (using induction) that 2 K+l — 1 > K + 1, and hence that 
Sk + i < M for all natural numbers K , hence (<5jv)^ =1 is bounded. □ 


Corollary 7.3.7. Let q > 0 be a rational number. Then the series 
1/ra 9 is convergent when q > 1 and divergent when q < 1. 

Proof. The sequence (1 /n q )^ =1 is non-negative and decreasing (by 
Lemma 5.6.9(d)), and so the Cauchy criterion applies. Thus this se- 
ries is convergent if and only if 


£ 2 * 

k = 0 


1 

(: 2 k )i 


is convergent. But by the laws of exponentiation (Lemma 5.6.9) we can 
rewrite this as the geometric series 


£( 2 1 -«)‘. 

k = 0 


As mentioned earlier, the geometric series Yl'kLo xk converges if and 
only if \x\ < 1. Thus the series l/ n9 will converge if and only if 

|2 1-9 | < 1, which happens if and only if q > 1 (why? Try proving it just 
using Lemma 5.6.9, and without using logarithms). □ 


In particular, the series YlnLi V n (also known as the harmonic se- 
ries) is divergent, as claimed earlier. However, the series Yln = i l/^ 2 is 
convergent. 

Remark 7.3.8. The quantity l/ n9 > when it converges, is called 

C(q), the Riemann-zeta function of q. This function is very important 
in number theory, and in particular in the distribution of the primes; 
there is a very famous unsolved problem regarding this function, called 
the Riemann hypothesis, but to discuss it further is far beyond the scope 
of this text. I will mention however that there is a US$ 1 million prize - 
and instant fame among all mathematicians - attached to the solution 
to this problem. 
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— Exercises — 

Exercise 7.3.1. Use Proposition 7.3.1 to prove Corollary 7.3.2. 

Exercise 7.3.2. Prove Lemma 7.3.3. (Hint: for the first part, use the zero 
test. For the second part, first use induction to establish the geometric series 
formula 

N 

= (1 — x N+1 )/(l — x) 

n — 0 

and then apply Lemma 6.5.2.) 

Exercise 7.3.3. Let Y^=o a « be an absolutely convergent series of real numbers 
such that l a «l = 0- Show that a n = 0 for every natural number n. 

7.4 Rearrangement of series 

One feature of finite sums is that no matter how one rearranges the 
terms in a sequence, the total sum is the same. For instance, 

Ul 4 " 0-2 "+■ + 04 + CI5 = 0,4 + (Z3 + (Z5 + a\ + (22- 

A more rigorous statement of this, involving bijections, has already ap- 
peared earlier, see Remark 7.1.12. 

One can ask whether the same thing is true for infinite series. If all 
the terms are non-negative, the answer is yes: 

Proposition 7.4.1. Let a n be a convergent series of non-negative 

real numbers, and let f : N — >• N be a bijection. Then Ylm=o a f(m) 
also convergent, and has the same sum: 

OO OO 

y: = X/ a /( m ) • 

71= 0 771= 0 

Proof. We introduce the partial sums Sjy := Yln= o a n and 7 m := 
Em=o a /W' We know that the sequences (Sj\r)ffL 0 and (Tm)“= 0 are 
increasing. Write L := sup(<S'iv)^ =0 and L' := sup (T m)^ = 0 - By Propo- 
sition 6.3.8 we know that L is finite, and in fact L = Yl^=o a Y by 
Proposition 6.3.8 again we see that we will thus be done as soon as we 
can show that L' = L. 

Fix M, and let Y be the set Y := {m € N : m < M}. Note that / 
is a bijection between Y and f(Y). By Proposition 7.1.11, we have 

M 

Tjyl = ^ a f(m) ~ ^2 a f(m) = "^2 an ' 

m = 0 meY n£f(Y) 
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The sequence f =0 is finite, hence bounded, i.e., there exists an 

N such that f{m ) < N for all m < N . In particular f(Y) is a subset 
of {n G N : n < TV}, and so by Proposition 7.1.11 again (and the 
assumption that all the a n are non-negative) 

N 

Tm = 'y ^ y ^ O-n = y ^ = Sn • 

n£f(Y) nE{nEN:n<7V} n=0 

But since (<S/v)jv=o has a supremum of L, we thus see that Sn < L, 
and hence that Tm < T for all M. Since Z/ is the least upper bound of 
(' Tm)m=o , this bnplies that L' < L. 

A very similar argument (using the inverse / -1 instead of /) shows 
that every Sjy is bounded above by L\ and hence L < L' . Combining 
these two inequalities we obtain L = L' , as desired. □ 

Example 7.4.2. From Corollary 7.3.7 we know that the series 

OO 

Y 1 /n 2 = 1 + 1/4 + 1/9 + 1/16 + 1/25 + 1/36 + . . . 

71=1 

is convergent. Thus, if we interchange every pair of terms, to obtain 

1/4 + 1 + 1/16 + 1/9 + 1/36 + 1/25 + . . . 

we know that this series is also convergent, and has the same sum. (It 
turns out that the value of this sum is £(2) = n 2 / 6, a fact which we shall 
prove in Exercise 11.32.2.) 

Now we ask what happens when the series is not non- negative. Then 
as long as the series is absolutely convergent, we can still do rearrange- 
ments: 

Proposition 7.4.3 (Rearrangement of series). Let ^2^=0 an a ^' 
solutely convergent series of real numbers, and let f : N — >• N be a 
bijection. Then Ylm=o a f(™) a ^ so absolutely convergent, and has the 
same sum: 

OO OO 

= X/ a /( m ) • 

71= 0 771=0 

Proof. (Optional) We apply Proposition 7.4.1 to the infinite series 
l a «l w bich by hypothesis is a convergent series of non- negative 



176 


7. Series 


numbers. If we write L := ^// = ol a n|) then by Proposition 7.4.1 we 
know that Ylm = o \ a f(m)\ also converges to L. 

Now write L' := Y^=o a n- We have to show that Ylm=o a f(m) a ^ so 
converges to L' . In other words, given any e > 0, we have to find an M 
such that Em=o a f(m) is e-close to L' for every M' > M. 

Since l°n| i s convergent, we can use Proposition 7.2.5 and find 

an Ni such that Ylh= P \ a n\ < e/2 for all p,q > N\. Since Ylri=o a n 
converges to L', the partial sums Yln= o°+ also converge to L and so 
there exists N > N\ such that Yl n =o a n is e/2-close to L' . 

Now the sequence (/ _1 (n))(/ =0 is finite, hence bounded, so there 
exists an M such that / -1 (n) < M for all 0 < n < N. In particular, for 
any M' > M, the set {f(m) : m G N; m < M'} contains {n € N : n < 
A r } (why? ) . So by Proposition 7.1.11, for any M' > M 


M' 

Y a f( m ) 

m = 0 


E 


C n 


ne{/(m):meN;m<Af / } 


N 

^ ^ On + ^ ^ CL n 
n = 0 n£X 


where X is the set 

X = {/(m) : m G N; m < M'}\{n € N : n < N}. 

The set X is finite, and is therefore bounded by some natural number 
q ; we must therefore have 

XC{n€N:IV + l<n<g} 


(why?). Thus 

g 

^2 a n < ^2 \ a n\ < Y - £ / 2 

nEX nEX n=N-\-l 

by our choice of N. Thus J^m=o a f(m) is e/2-close to Y2n=o a n> w hich 
as mentioned before is e/2-close to L' . Thus J/m=o a f(m) is e-close to L 
for all M' > M , as desired. □ 

Surprisingly, when the series is not absolutely convergent, then the 
rearrangements are very badly behaved. 

Example 7.4.4. Consider the series 


1/3 -1/4 + 1/5- 1/6 + 1/7- 1/8 + .... 
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This series is not absolutely convergent (why?), but is conditionally 
convergent by the alternating series test, and in fact the sum can 
be seen to converge to a positive number (in fact, it converges to 
ln(2) — 1/2 = 0.193147. . ., see Example 11.25.7). Basically, the reason 
why the sum is positive is because the quantities (1/3 — 1/4), (1/5 — 1/6), 
(1/7 — 1/8) are all positive, which can then be used to show that every 
partial sum is positive. (Why? you have to break into two cases, de- 
pending on whether there are an even or odd number of terms in the 
partial sum.) 

If, however, we rearrange the series to have two negative terms to 
each positive term, thus 

1/3 - 1/4-1/6 + 1/5-1/8-1/10 + 1/7-1/12 - 1/14 + . .. 

then the partial sums quickly become negative (this is because (1/3 — 
1/4 — 1/6), (1/5 — 1/8 — 1/9), and more generally (l/(2n + 1) — l/4n — 
l/(4n + 2)) are all negative), and so this series converges to a negative 
quantity; in fact, it converges to 

(ln(2) - l)/2 = -.153426.... 

There is in fact a surprising result of Riemann, which shows that a series 
which is conditionally convergent but not absolutely convergent can in 
fact be rearranged to converge to any value (or rearranged to diverge, 
in fact - see Exercise 8.2.6); see Theorem 8.2.8. 

To summarize, rearranging series is safe when the series is absolutely 
convergent, but is somewhat dangerous otherwise. (This is not to say 
that rearranging a series that is not absolutely convergent necessarily 
gives you the wrong answer - for instance, in theoretical physics one often 
performs similar maneuvres, and one still (usually) obtains a correct 
answer at the end - but doing so is risky, unless it is backed by a rigorous 
result such as Proposition 7.4.3.) 

— Exercises — 

Exercise 7.4.1. Let E/Lo a n be an absolutely convergent series of real numbers. 
Let / : N — > N be an increasing function (i.e., f(n + 1) > f(n) for all n € N). 
Show that E^=o a f(n) is also an absolutely convergent series. (Hint: try to 
compare each partial sum of E)/Lo a f(n ) with a (slightly different) partial sum 

of E^=o +•) 
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7.5 The root and ratio tests 

Now we can state and prove the famous root and ratio tests for conver- 
gence. 

Theorem 7.5.1 (Root test). Let Y^=m a n ^ e a ser ^ es of real numbers, 
and let a := limsup n _ ) . 00 \a n \ l ^ n . 

(a) If a < 1, then the series Y^= m a ^ absolutely convergent ( and 
hence conditionally convergent). 

( b ) If a > 1, then the series Y^= m a n conditionally convergent 

(and hence cannot be absolutely convergent either). 

(c) If a = 1, we cannot assert any conclusion. 

Proof. First suppose that a < 1. Note that we must have a > 0, 
since la™! 1 /™ > 0 for every n. Then we can find an e > 0 such that 
0 < a + e < 1 (for instance, we can set s := (1 — a)/ 2). By Proposition 
6.4.12(a), there exists an N > m such that | a n | 1 / n < a + e for all 
n > N . In other words, we have |a n | < (a + e) n for all n > N. But 
from the geometric series we have that + £ ) n absolutely 

convergent, since 0 < a + £ < 1 (note that the fact that we start from 
N is irrelevant by Proposition 7.2.14(c)). Thus by the comparison test, 
we see that Y™=n a « absolutely convergent, and thus Y^=m a n is 
absolutely convergent, by Proposition 7.2.14(c) again. 

Now suppose that a > 1. Then by Proposition 6.4.12(b), we see 
that for every N > m there exists an n > N such that lani 1 /” > 1, 
and hence that |a n | > 1. In particular, (a n ))f =N is not 1-close to 0 for 
any N , and hence (a„)“ =m is not eventually 1-close to 0. In particular, 
(a n ))f =m does not converge to zero. Thus by the zero test, Y7(=m 
not conditionally convergent. 

For a = 1, see Exercise 7.5.3. □ 

The root test is phrased using the limit superior, but of course if 
linin-^oo |a n | 1//n converges then the limit is the same as the limit superior. 
Thus one can phrase the root test using the limit instead of the limit 
superior, but only when the limit exists. 

The root test is sometimes difficult to use; however we can replace 
roots by ratios using the following lemma. 
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Lemma 7.5.2. Let {c n )ff =rn be a sequence of positive numbers. Then 
we have 

lim inf ° n+1 < li m inf c, 1 /™ < lim sup cl/ n < lim sup " +1 . 
n— ^°° C n n^-oo n— >-oo n—> oo Cn 

Proof. There are three inequalities to prove here. The middle inequality 
follows from Proposition 6.4.12(c). We shall prove the last inequality, 
and leave the first one to Exercise 7.5.1. 

Write L := limsup n _ KX) If L = +oo then there is nothing to 

prove (since x < +oo for every extended real number x), so we may 
assume that L is a finite real number. (Note that L cannot equal — oo; 
why?). Since is always positive, we know that L > 0. 

Let e > 0. By Proposition 6.4.12(a), we know that there exists 
an N > m. such that ^±1 < L + e for all n > N. This implies that 
c n +i < c n (L + e) for all n> N. By induction this implies that 

c n < cn(L + e) n ~ N for all n > N 

(why?). If we write A := cn(L + s)~ N , then we have 

c n ^ A(L + e) n 

and thus 

c l/n < A 1 /n( L + e) 
for all n > N. But we have 

lim A l / n {L + e) = L + e 

n—> oo 

by the limit laws (Theorem 6.1.19) and Lemma 6.5.3. Thus by the 
comparison principle (Lemma 6.4.13) we have 

lim sup c\l n < L + e. 

n— >• oo 

But this is true for all e > 0, so this must imply that 

lim sup c\[ n < L 

oo 


(why? prove by contradiction), as desired. 


□ 
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From Theorem 7.5.1 and Lemma 7.5.2 (and Exercise 7.5.3) we have 

Corollary 7.5.3 (Ratio test). Let Y^= m a n ^ e a ser * es °f non- 
zero numbers. ( The non-zero hypothesis is required so that the ratios 
|a n _|_i|/|a n | appearing below are well-defined .) 

• If limsup n _ ) . 00 < 1; then the series Y^=m a n absolutely 

convergent ( hence conditionally convergent) . 

• If lim inf^oo > 1 , then the series Y^=m a n no ^ condi- 

tionally convergent {and thus cannot be absolutely convergent) . 

• In the remaining cases, we cannot assert any conclusion. 

Another consequence of Lemma 7.5.2 is the following limit: 

Proposition 7.5.4. We have lim n _ s . 0O n 1 /” = 1. 

Proof. By Lemma 7.5.2 we have 

lim sup n 1 /" < lim sup (n + l)/n = lim sup 1 + 1/n = 1 

n— >oo n— >-oo n— >00 

by Proposition 6.1.11 and limit laws (Theorem 6.1.19). Similarly we 
have 

lim inf n 1 ^' 1 > lim inf (n + l)/n = lim inf 1 + 1/n = 1. 

n— >• oo n— >■ oo n— >■ oo 

The claim then follows from Proposition 6.4.12(c) and (f). □ 

Remark 7.5.5. In addition to the ratio and root tests, another very 
useful convergence test is the integral test, which we will cover in Propo- 
sition 11.6.4. 


— Exercises — 

Exercise 7.5.1. Prove the first inequality in Lemma 7.5.2. 

Exercise 7.5.2. Let a; be a real number with \x\ < 1, and q be a real num- 
ber. Show that the series n q x n is absolutely convergent, and that 

lim^oo n q x n = 0. 

Exercise 7.5.3. Give an example of a divergent series a n of positive 

numbers a n such that linin^^ a n+ i/a n = lim n ^ too a)J. n = 1, and give an 
example of a convergent series °f positive numbers b n such that 

limn^oo b n +i/b n = \im n ^ 00 bl/ n = 1. (Hint: use Corollary 7.3.7.) This shows 
that the ratio and root tests can be inconclusive even when the summands are 
positive and all the limits converge. 
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Infinite sets 


We now return to the study of set theory, and specifically to the study 
of cardinality of sets which are infinite (i.e., sets which do not have 
cardinality n for any natural number n), a topic which was initiated in 
Section 3.6. 

8.1 Countability 

From Proposition 3.6.14(c) we know that if X is a finite set, and Y is 
a proper subset of X , then Y does not have equal cardinality with X. 
However, this is not the case for infinite sets. For instance, from Theorem 
3.6.12 we know that the set N of natural numbers is infinite. The set 
N — {0} is also infinite, thanks to Proposition 3.6.14(a) (why?), and is a 
proper subset of N. However, the set N — {0}, despite being “smaller” 
than N, still has the same cardinality as N, because the function / : 
N — >• N — {0} defined by f(n) := n + 1, is a bijection from N to N — {0}. 
(Why?) This is one characteristic of infinite sets; see Exercise 8.1.1. 

We now distinguish two types of infinite sets: the countable sets and 
the uncountable sets. 

Definition 8.1.1 (Countable sets). A set X is said to be countably 
infinite (or just countable) iff it has equal cardinality with the natural 
numbers N. A set X is said to be at most countable iff it is either 
countable or finite. We say that a set is uncountable if it is infinite but 
not countable. 

Remark 8.1.2. Countably infinite sets are also called denumerable sets. 

Examples 8.1.3. From the preceding discussion we see that N is count- 
able, and so is N — {0}. Another example of a countable set is the even 
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natural numbers {2n : n € N}, since the function f(n) : = 2 n provides a 
bijection between N and the even natural numbers (why?). 

Let X be a countable set. Then, by definition, we know that there 
exists a bijection / : N — > X. Thus, every element of X can be written 
in the form f(n) for exactly one natural number n. Informally, we thus 
have 

X = {/(0), /(l), /(2), /(3), . . .}. 

Thus, a countable set can be arranged in a sequence, so that we have 
a zeroth element /( 0), followed by a first element /( 1), then a sec- 
ond element /( 2), and so forth, in such a way that all these elements 
/( 0), /( 1), /( 2), . . . are all distinct, and together they fill out all of X. 
(This is why these sets are called countable ; because we can literally 
count them one by one, starting from /( 0), then /( 1), and so forth.) 
Viewed in this way, it is clear why the natural numbers 

N = {0, 1, 2, 3, . . .}, 

the positive integers 

N — {0} = {1, 2, 3, . . .}, 
and the even natural numbers 

{0,2, 4, 6, 8,...} 

are countable. However, it is not as obvious whether the integers 
Z = {... ,-3,-2, -1,0, 1,2,3,...} 

or the rationals 

Q = {0,1/4, -2/3,...} 

or the reals 

R = {0, v/2, -7T, 2.5, . . .} 

are countable or not; for instance, it is not yet clear whether we can 
arrange the real numbers in a sequence /(0), /(l), /(2), . . .. We will 
answer these questions shortly. 

From Proposition 3.6.4 and Theorem 3.6.12, we know that countable 
sets are infinite; however it is not so clear whether all infinite sets are 
countable. Again, we will answer those questions shortly. We first need 
the following important principle. 



8.1. Countability 


183 


Proposition 8.1.4 (Well ordering principle). Let X be a non-empty 
subset of the natural numbers N. Then there exists exactly one element 
n € X such that n < m for all m £ X. In other words, every non-empty 
set of natural numbers has a minimum element. 

Proof. See Exercise 8.1.2. □ 

We will refer to the element n given by the well-ordering principle 
as the minimum of X, and write it as min(X). Thus for instance the 
minimum of the set {2, 4, 6, 8, . . .} is 2. This minimum is clearly the 
same as the infimum of X, as defined in Definition 5.5.10 (why?). 

Proposition 8.1.5. Let X be an infinite subset of the natural numbers 
N. Then there exists a unique bijection f : N — >• X which is increasing, 
in the sense that f(n + 1) > f(n) for all n £ N. In particular, X has 
equal cardinality with N and is hence countable. 

Proof. We will give an incomplete sketch of the proof, with some gaps 
marked by a question mark (?); these gaps will be filled in Exercise 8.1.3. 

We now define a sequence ao,ai, 02 ,... of natural numbers recur- 
sively by the formula 

a n := niin{x' / a m for all m < n}. 

Intuitively speaking, a o is the smallest element of X; ai is the second 
smallest element of X, i.e. , the smallest element of X once ao is removed; 
a 2 is the third smallest element of X ; and so forth. Observe that in order 
to define a n , one only needs to know the values of a m for all m < n, so 
this definition is recursive. Also, since X is infinite, the set {x € X : 
x a m for all m < n} is infinite(?), hence non-empty. Thus by the well- 
ordering principle, the minimum, min{x / a m for all m < n} is 

always well-defined. 

One can show(?) that a n is an increasing sequence, i.e. 


ao < oi < 02 < • • • 


and in particular that(?) a n / a m for all n / m. Also, we have(?) 
a n € X for each natural number n. 

Now define the function / : N — >• X by f(n) := a n . From the 
previous paragraph we know that / is one-to-one. Now we show that / 
is onto. In other words, we claim that for every x € X, there exists an 
n such that a n = x. 
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Let x € X. Suppose for sake of contradiction that a n fi x for every 
natural number n. Then this implies(?) that x is an element of the 
set {x € X : x a m for all m < n} for all n. By definition of a n , 
this implies that x > a n for every natural number n. However, since 
a n is an increasing sequence, we have a n > n (?), and hence x > n for 
every natural number n. In particular we have x > x + 1, which is a 
contradiction. Thus we must have a n = x for some natural number n, 
and hence / is onto. 

Since / : N — >• X is both one-to-one and onto, it is a bijection. 
We have thus found at least one increasing bijection / from N to X. 
Now suppose for sake of contradiction that there was at least one other 
increasing bijection g from N to X which was not equal to /. Then the 
set {n € N : g(n) fi /(n)} is non-empty, and define m := min{n € N : 
g(n) /(n)}, thus in particular g(m ) f(m ) = a m , and g(n) = f(n) = 
a n for all n < m. But we then must have(?) 

g(m) = min{x' € X : x at for all t < m} = a m , 

a contradiction. Thus there is no other increasing bijection from N to 
X other than /. □ 

Since finite sets are at most countable by definition, we thus have 

Corollary 8.1.6. All subsets of the natiLral numbers are at most count- 
able. 

Corollary 8.1.7. If X is an at most countable set, and Y is a subset 
of X , then Y is at most countable. 

Proof. If X is finite then this follows from Proposition 3.6.14(c), so 
assume X is countable. Then there is a bijection / : X — >• N between 
X and N. Since Y is a subset of X, and / is a bijection from X and 
N, then when we restrict / to Y, we obtain a bijection between Y and 
f(Y). (Why is this a bijection?) Thus f(Y) has equal cardinality with 
Y . But f(Y) is a subset of N, and hence at most countable by Corollary 
8.1.6. Hence Y is also at most countable. □ 

Proposition 8.1.8. Let Y be a set, and let f : N — »• Y be a function. 
Then /( N) is at most countable. 


Proof. See Exercise 8.1.4. 


□ 
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Corollary 8.1.9. Let X be a countable set, and let f : X — >• Y be a 
function. Then f(X) is at most countable. 

Proof. See Exercise 8.1.5. □ 

Proposition 8.1.10. Let X be a countable set, and let Y be a countable 
set. Then XL) Y is a countable set. 

Proof. See Exercise 8.1.7. □ 

To summarize, any subset or image of a countable set is at most 
countable, and any finite union of countable sets is still countable. We 
can now establish countability of the integers. 

Corollary 8.1.11. The integers Z are countable. 

Proof. We already know that the set N = {0, 1,2,3, ...} of natural num- 
bers are countable. The set — N defined by 

— N := {— n : n € N} = {0, —1, —2, —3, . . .} 

is also countable, since the map f(n) := —n is a bijection between N 
and this set. Since the integers are the union of N and — N, the claim 
follows from Proposition 8.1.10 □ 

To establish countability of the rationals, we need to relate count- 
ability with Cartesian products. In particular, we need to show that the 
set N x N is countable. We first need a preliminary lemma: 

Lemma 8.1.12. The set 

A := {(n, m) € N x N : 0 < m < n} 


is countable. 

Proof. Define the sequence do, «i, d 2 , . . . recursively by setting ao := 0, 
and a n +i := a n + n + 1 for all natural numbers n. Thus 

do = 0; ai = 0 + 1; 02 = 0 + 1 + 2; 03 = 0 + 1 + 2 + 3; . . . . 

By induction one can show that a n is increasing, i.e., that a n > a m 
whenever n > m (why?). 

Now define the function / : A — >• N by 


f(n,m) := a n + m. 
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We claim that / is one-to-one. In other words, if (n, m) and ( n',m ') are 
any two distinct elements of A, then we claim that f(n, m) f(n', ml). 

To prove this claim, let (n, m) and (n 7 , ml) be two distinct elements of 
A. There are three cases: nl = n, n' > n, and n' < n. First suppose that 
n' = n. Then we must have m rn' , otherwise (n, m) and {n' , ml ) would 
not be distinct. Thus a n + m a n + m 7 , and hence f(n, m) fi f(n', ml), 
as desired. 

Now suppose that nl > n. Then n' > n + 1, and hence 
f{n', m ') = a n i + m > a n > > a n+ \ = a n + n + 1. 

But since (n, m) € A, we have m < n < n + 1, and hence 
/(n 7 , m 7 ) > a n + n + 1 > a n + m = /(?r, m), 
and thus f{n',ml) 

The case nl < n is proven similarly, by switching the roles of n and 
n 7 in the previous argument. Thus we have shown that / is one-to-one. 
Thus / is a bijection from A to f(A), and so A has equal cardinality 
with f(A). But f{A ) is a subset of N, and hence by Corollary 8.1.6 
f{A ) is at most countable. Therefore A is at most countable. But, A 
is clearly not finite. (Why? Hint: if A was finite, then every subset of 
A would be finite, and in particular {(n, 0) : n G N} would be finite, 
but this is clearly countably infinite, a contradiction.) Thus, A must be 
countable. □ 

Corollary 8.1.13. The set N x N is countable. 

Proof. We already know that the set 

A := {(n, m) G N x N : 0 < m < n} 

is countable. This implies that the set 

B := {(n, ?n)€NxN:0<n< m} 

is also countable, since the map / : A B given by f(n, rn) := { m , n) 
is a bijection from A to B (why?). But since N X N is the union of A 
and B (why?), the claim then follows from Proposition 8.1.10. □ 


Corollary 8.1.14. If X and Y are countable, then X xY is countable. 
Proof. See Exercise 8.1.8. □ 



8.1. Countability 


187 


Corollary 8.1.15. The rationals Q are countable. 

Proof. We already know that the integers Z are countable, which implies 
that the non-zero integers Z — {0} are countable (why?). By Corollary 
8.1.14, the set 


Z x (Z - {0}) = {(a, b) : a, b G Z, b + 0} 

is thus countable. If one lets / : Z x (Z — {0}) — >• Q be the function 
/(a, b) := a/b (note that / is well-defined since we prohibit b from being 
equal to 0), we see from Corollary 8.1.9 that /( Z x (Z — {0})) is at most 
countable. But we have /( Z x (Z — {0})) = Q (why? This is basically the 
definition of the rationals Q). Thus Q is at most countable. However, 
Q cannot be finite, since it contains the infinite set N. Thus Q is 
countable. □ 

Remark 8.1.16. Because the rationals are countable, we know in prin- 
ciple that it is possible to arrange the rational numbers as a sequence: 

Q = {ao, dl,d2, a,3 , . . .} 

such that every element of the sequence is different from every other 
element, and that the elements of the sequence exhaust Q (i.e., every 
rational number turns up as one of the elements a n of the sequence). 
However, it is quite difficult (though not impossible) to actually try and 
come up with an explicit sequence ao, a\, . . . which does this; see Exercise 
8 . 1 . 10 . 


— Exercises — 

Exercise 8.1.1. Let X be a set. Show that X is infinite if and only if there 
exists a proper subset Y CXofX which has the same cardinality as X. (This 
exercise requires the axiom of choice, Axiom 8.1) 

Exercise 8.1.2. Prove Proposition 8.1.4. (Hint: you can either use induction, 
or use the principle of infinite descent, Exercise 4.4.2, or use the least upper 
bound (or greatest lower bound) principle, Theorem 5.5.9.) Does the well- 
ordering principle work if we replace the natural numbers by the integers? 
What if we replace the natural numbers by the positive rationals? Explain. 

Exercise 8.1.3. Fill in the gaps marked (?) in Proposition 8.1.5. 

Exercise 8.1.4. Prove Proposition 8.1.8. (Hint: the basic problem here is that 
/ is not assumed to be one-to-one. Define A to be the set 


A := {n e N : f(m) f(n ) for all 0 < m < n}; 
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informally speaking, A is the set of natural numbers n for which f(n) does not 
appear in the sequence /( 0), /( 1), . . . f(n— 1). Prove that when / is restricted 
to A, it becomes a bijection from A to /( N). Then use Proposition 8.1.5.) 

Exercise 8.1.5. Use Proposition 8.1.8 to prove Corollary 8.1.9. 

Exercise 8.1.6. Let A be a set. Show that A is at most countable if and only 
if there exists an injective map / : A — > N from A to N. 

Exercise 8.1.7. Prove Proposition 8.1.10. (Hint: by hypothesis, we have a 
bijection / : N — > X, and a bijection g : N — > Y. Now define /i:N->lUF 
by setting h(2n) := /(n) and h(2n + 1) := g{n) for every natural number n, 
and show that h( N) = X UY. Then use Corollary 8.1.9, and show that XU Y 
cannot possibly be finite.) 

Exercise 8.1.8. Use Corollary 8.1.13 to prove Corollary 8.1.14. 

Exercise 8.1.9. Suppose that / is an at most countable set, and for each a € J, 
let A a be an at most countable set. Show that the set (J ag/ A a is also at most 
countable. In particular, countable unions of countable sets are countable. 
(This exercise requires the axiom of choice, see Section 8.4.) 

Exercise 8.1.10. Find a bijection / : N — > Q from the natural numbers to the 
rationals. (Warning: this is actually rather tricky to do explicitly; it is difficult 
to get / to be simultaneously injective and surjective.) 


8.2 Summation on infinite sets 

We now introduce the concept of summation on countable sets , which 
will be well-defined provided that the sum is absolutely convergent. 

Definition 8.2.1 (Series on countable sets). Let X be a countable set, 
and let / : X — >• R be a function. We say that the series f{x ) 

is absolutely convergent iff for some bijection g : N — >■ X, the sum 
absolutely convergent. We then define the sum of 
YIxgx /( x ) by the formula 


= ^2 /G?( n ))- 

x&X n = 0 

From Proposition 7.4.3 (and Proposition 3.6.4), one can show that 
these definitions do not depend on the choice of g, and so are well defined. 
We can now give an important theorem about double summations. 

Theorem 8.2.2 (Fubini’s theorem for infinite sums). Let f : N X N — >■ 
R be a function such that S(n m )eNxN absolutely convergent. 



8.2. Summation on infinite sets 


189 


Then we have 


oo / oo 

J2 ( J2 

n = 0 \m = 0 


^2 f(n, m ) 

(n,m)£ NxN 

*22 m ) 

(m,n)GNxN 

oo / oo 

m =0 \ rc =0 



In other words, we can switch the order of infinite sums provided 
that the entire sum is absolutely convergent. You should go back and 
compare this with Example 1.2.5. 

Proof. (A sketch only; this proof is considerably more complex than the 
other proofs, and is optional reading.) The second equality follows easily 
from Proposition 7.4.3 (and Proposition 3.6.4). We shall just prove the 
first equality, as the third is very similar (basically one switches the role 
of n and m). 

Let us first consider the case when /(n, m) is always non-negative 
(we will deal with the general case later). Write 

L:= ^2 /( n > m ); 

(n,m)eNxN 

our task is to show that the series ^=o(Em=o /(n, m )) converges to 
L. 

One can easily show that Yl(n m \ e x /( n , m ) — ^ f° r finite sets 
X C N x N. (Why? Use a bijection g between NxN and N, and 
then use the fact that g(X) is finite, hence bounded.) In particular, for 
every n € N and M € N we have 'Y2n = 0 /(n, m) < L, which implies by 
Proposition 6.3.8 that Ylm= o f ( n ' m ) i s convergent for each m. Similarly, 
for any N G N and M G N we have (by Corollary 7.1.14) 

N M 

EE f(n,m ) < E /(«.») <r 

n=0m=0 (n,m)eX 


where X is the set {(n, m) € N x N : n < N,m < M} which is hnite 
by Proposition 3.6.14. Taking suprema of this as M — >• oo we have (by 
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limit laws, and an induction on N) 

N OO 

Y Y ^ L - 

n = 0 m = 0 

By Proposition 6.3.8, this implies that £^1 0 £m= o /( n > m ) converges, 
and 

OO OO 

Y Y /( n ’ m ) ^ L ■ 

n = 0 m = 0 

To finish the proof, it will suffice to show that 

OO OO 

Y Y /( n ’ m ) > l ~£ 

n = 0 m = 0 

for every e > 0. (Why will this be enough? Prove by contradiction.) So, 
let e > 0. By definition of L, we can then find a finite set X C N x N 
such that S( nm ) g x/( n i m ) > L — e. (Why?) This set, being finite, 
must be contained in some set of the form Y := {(n, m) € N X N : n < 
N]m < A/}. (Why? use induction.) Thus by Corollary 7.1.14 

N M 

YY^ n ' m ^ = Y Y f{n,m)>L-e 

n=0 m=0 ( n,m)eY ( n,m)eX 

and hence 

oo oo N oo N M 

Y Y m ^)-YY m )^YY m )> L ~ e 

n = 0 m = 0 71—0 m = 0 n = 0 m=0 

as desired. 

This proves the claim when the f(n,m ) are all non-negative. A 
similar argument works when the f(n, m) are all non-positive (in fact, 
one can simply apply the result just obtained to the function — /(n, m), 
and then use limit laws to remove the — . For the general case, note that 
any function /(n, m) can be written (why?) as f + (n,m) + /_(n, m), 
where f + (n,m) is the positive part of f(n,m) (i.e. , it equals f(n,m) 
when f(n,m) is positive, and 0 otherwise), and /_ is the negative part 
of f(n, m) (it equals /(n, m) when /(n, m) is negative, and 0 otherwise). 
It is easy to show that if X^(nm)eNxN f( n i m ) ^ absolutely convergent, 
then so are £(n,m) G NxN f+(n, m) and £ (njm)eNxN /-(«, m). So now 
one applies the results just obtained to /+ and to /_ and adds them 
together using limit laws to obtain the result for a general /. □ 



8.2. Summation on infinite sets 


191 


There is another characterization of absolutely convergent series. 

Lemma 8.2.3. Let X be an at most countable set, and let f : X R 
be a function. Then the series Ylxex f( x ) absolutely convergent if 
and only if 

sup < ^2 \f{ x )\ '■ A C X, A finite > < oo. 

UeA J 

Proof. See Exercise 8.2.1. □ 

Inspired by this lemma, we may now define the concept of an abso- 
lutely convergent series even when the set X could be uncountable. (We 
give some examples of uncountable sets in the next section.) 

Definition 8.2.4. Let X be a set (which could be uncountable), and 
let / : X — >• R be a function. We say that the series YIxgx f ( x ) * s 
absolutely convergent iff 



Note that we have not yet said what the series YIxgx f( x ) is e( l ua l 
to. This shall be accomplished by the following lemma. 

Lemma 8.2.5. Let X be a set ( which could be uncountable), and let 
f : X — >• R be a function such that the series Ylxex f( x ) absolutely 
convergent. Then the set {x € X : /(x) 0} is at most countable. (This 

result requires the axiom of choice, see Section 8.f.) 

Proof. See Exercise 8.2.2. □ 

Because of this, we can define the value of Ylxex f( x ) f° r an y abso- 
lutely convergent series on an uncountable set X by the formula 

:= 2 /( x )’ 

x£X x£X:f(x)^0 

since we have replaced a sum on an uncountable set X by a sum on 
the countable set {x € X : f{x) 0}. (Note that if the former sum is 
absolutely convergent, then the latter one is also.) Note also that this 
definition is consistent with the definitions we already have for series on 
countable sets. 
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We give some laws for absolutely convergent series on arbitrary sets. 

Proposition 8.2.6 (Absolutely convergent series laws). Let X be an 
arbitrary set (possibly uncountable), and let f : X R and g : X — >• R 
be functions such that the series ^2 xe \ f(x) and Y1 x ex d( x ) are both 
absolutely convergent. 

(a) The series YIxe x (f(x) + g(x)) is absolutely convergent, and 

^2if( x ) + g( x )) = Y /( x ) + 

xEX xEX xEX 


( b ) If c is a real number, then YIxex c f( x ) absolutely convergent, 
and 

Y cf(x ) = cYf( x )- 

xEX xEX 

(c) If X = X 1 UX 2 for some disjoint sets X\ and X 2 , then Y1 x e x 1 /( x ) 
and Ylxex 2 f ( x ) are absolutely convergent, and 

Y = + ZJ /( x )- 

x£XiUX2 xEXi xEX 2 

Conversely, if h : X — >• R is such that YhxEXx M x ) and 
^xeXo hfix) are absolutely convergent, then Sa;gXiUX 2 M x ) a ^ so 
absolutely convergent, and 

Y /i ( x ) = M x ) + 2 h ^- 

xEXiUX 2 x&Xi xEX 2 


(d) If Y is another set, and f : Y — >• X is a bijection, then 
ThyEY fifiiv)) absolutely convergent, and 

Y ^ x )- 

yEY xEX 

(This result requires the axiom of choice when X is uncountable, see 
Section 8.f.) 

Proof. See Exercise 8.2.3. □ 

Recall in Example 7.4.4 that if a series was conditionally conver- 
gent, but not absolutely convergent, then its behaviour with respect to 
rearrangements was bad. We now analyze this phenomenon further. 
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Lemma 8.2.7. Let Y2^=o a n be a series of real numbers which is con- 
ditionally convergent, but not absolutely convergent. Define the sets 
A + := {n € N : a n > 0} and := {n € N : a n < 0}, thus 
A + U A- = N and A + n A _ = 0. Then both of the series ^2 ne A + a « 
and a n are n °t conditionally convergent ( and thus not absolutely 

convergent) . 

Proof. See Exercise 8.2.4. □ 

We are now ready to present a remarkable theorem of Georg Rie- 
mann (1826-1866), which asserts that a series which converges condi- 
tionally but not absolutely can be rearranged to converge to any value 
one pleases! 

Theorem 8.2.8. Let a n be a series which is conditionally conver- 

gent, but not absolutely convergent, and let L be any real number. Then 
there exists a bijection f : N — >• N such that Ylm=o a f(rn ) converges 
conditionally to L. 

Proof. (Optional) We give a sketch of the proof, leaving the details to 
be filled in in Exercise 8.2.5. Let A+ and A_ be the sets in Lemma 8.2.7; 
from that lemma we know that ^ ngj4+ an d SneA_ a « both fail to be 
absolutely convergent. In particular A + and A— are infinite (why?). By 
Proposition 8.1.5 we can then find increasing bijections /+ : N — >• A+ 
and /_ : N -> A_. Thus the sums Em=o a /+(m) and J2m=o a f-(m) both 
fail to be absolutely convergent (why? ) . The plan shall be to select terms 
from the divergent series Ylm = o a f+(m.) an d Ylm = o a f-(m ) i n a well-chosen 
order in order to keep their difference converging towards L. 

We define the sequence no, ni, u, 2 , . . . of natural numbers recursively 
as follows. Suppose that j is a natural number, and that n t has already 
been defined for all i < j (this is vacuously true if j = 0). We then 
define nj by the following rule: 

(I) If J2o<i<j a ni < L, then we set 

nj := minjn € A + : n m for all i < j}. 

(II) If instead then we set 

nj := min{n € A_ : n nt for all i < j}. 
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Note that this recursive definition is well-defined because A+ and 
A- are infinite, and so the sets {n € A + : n m for all i < j} and 
rij := min{n € A _ : n ni for all i < j} are never empty. (Intuitively, 
we add a non-negative number to the series whenever the partial sum 
is too low, and add a negative number when the sum is too high.) One 
can then verify the following claims: 

• The map j H > rij is injective. (Why?) 

• Case I occurs an infinite number of times, and Case II also occurs 
an infinite number of times. (Why? prove by contradiction.) 

• The map j H > rij is surjective. (Why?) 

• We have lim^oo a U] = 0. (Why? Note from Corollary 7.2.6 that 
lim n _ >00 ci n — 0.) 

• We have lim^oo J2o<i<j = L - (Why?) 

The claim then follows by setting f(i ) := n* for all i. □ 

— Exercises — 

Exercise 8.2.1. Prove Lemma 8.2.3. (Hint: you may find Exercise 3.6.3 to be 
useful.) 

Exercise 8.2.2. Prove Lemma 8.2.5. (Hint: first show if M is the quantity 
M := sup{]T |/(x)| : AC X,A finite} 

xGA 


then the sets {x C X : |/(x)| > 1/n} are finite with cardinality at most Mn 
for every positive integer n. Then use Exercise 8.1.9 (which uses the axiom of 
choice, see Section 8.4).) 

Exercise 8.2.3. Prove Proposition 8.2.6. (Hint: you may of course use all the 
results from Chapter 7 to do this.) 

Exercise 8.2.4. Prove Lemma 8.2.7. (Hint: prove by contradiction, and use 
limit laws.) 

Exercise 8.2.5. Explain the gaps marked (why?) in the proof of Theorem 8.2.8. 

Exercise 8.2.6. Let a « a ser i es which is conditionally convergent, but 

not absolutely convergent. Show that there exists a bijection / : N — > N such 
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that a f(rn) diverges to +oo, or more precisely that 

oo oo 

limiiif Y a f(m) = dm sup Y a f(m) = + 00 - 
^°° m—N N ^°° m—N 

(Of course, a similar statement holds with +oo replaced by — oo.) 

8.3 Uncountable sets 

We have just shown that a lot of infinite sets are countable - even such 
sets as the rationals, for which it is not obvious how to arrange as a 
sequence. After such examples, one may begin to hope that other infinite 
sets, such as the real numbers, are also countable - after all, the real 
numbers are nothing more than (formal) limits of the rationals, and 
we’ve already shown the rationals are countable, so it seems plausible 
that the reals are also countable. 

It was thus a great shock when Georg Cantor (1845-1918) showed 
in 1873 that certain sets - including the real numbers R are in fact 
uncountable - no matter how hard you try, you cannot arrange the real 
numbers R as a sequence ao, «i, « 2 , • • •• (Of course, the real numbers R 

can contain many infinite sequences, e.g., the sequence 0,1, 2, 3, 4, 

However, what Cantor proved is that no such sequence can ever exhaust 
the real numbers; no matter what sequence of real numbers you choose, 
there will always be some real numbers that are not covered by that 
sequence.) 

Recall from Remark 3.4.10 that if X is a set, then the power set of 
X, denoted 2 X := {4 : 4 C X}, is the set of all subsets of X. Thus for 
instance 2^ 1,2 J' = {0, {1}, {2}, {1, 2}}. The reason for the notation 2 X is 
given in Exercise 8.3.1. 

Theorem 8.3.1 (Cantor’s theorem). Let X be an arbitrary set ( finite 
or infinite). Then the sets X and 2 X cannot have equal cardinality. 

Proof. Suppose for sake of contradiction that the sets X and 2 X had 
equal cardinality. Then there exists a bijection / : X — > 2 X between X 
and the power set of X. Now consider the set 

A := {x £ X : x 0 /(x)}. 

Note that this set is well-defined since f(x) is an element of 2 X and is 
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hence a subset of X. Clearly A is a subset of X, hence is an element 
of 2 X . Since / is a bijection, there must therefore exist x G X such 
that f(x ) = A. There are now two cases, depending on whether x G A 
or x fL A. If x G A, then by definition of A we have x fL f(x ), hence 
x (f A, a contradiction. But if x 0 A, then x fL f(x), hence by definition 
of A we have x € A, a contradiction. Thus in either case we have a 
contradiction. □ 

Remark 8.3.2. The reader should compare the proof of Cantor’s the- 
orem with the statement of Russell’s paradox (Section 3.2). The point 
is that a bijection between X and 2 X would come dangerously close to 
the concept of a set X “containing itself”. 

Corollary 8.3.3. 2 N is uncountable. 

Proof. By Theorem 8.3.1, 2 N cannot have equal cardinality with N, 
hence is either uncountable or finite. However, 2 N contains as a subset 
the set of singletons {{n} : n G N}, which is clearly bijective to N 
and hence countably infinite. Thus 2 N cannot be finite (by Proposition 
3.6.14), and is hence uncountable. □ 

Cantor’s theorem has the following important (and unintuitive) con- 
sequence. 

Corollary 8.3.4. R is uncountable. 

Proof. Let us define the map / : 2 N — >• R by the formula 

f{A) := V 10-”. 

neA 


Observe that since 10 -n is an absolutely convergent series (by 

Lemma 7.3.3), the series SneA 10 -n is a lso absolutely convergent (by 
Proposition 8.2.6(c)). Thus the map / is well defined. We now claim 
that / is injective. Suppose for sake of contradiction that there were 
two distinct sets A, B G 2 N such that f(A) = f(B). Since A B, the 
set ( A\B ) U ( B\A ) is a non-empty subset of N. By the well-ordering 
principle (Proposition 8.1.4), we can then define the minimum of this 
set, say no := min(H\R) U ( B\A ). Thus no either lies in A\B or B\A. 
By symmetry we may assume it lies in A\B. Then no G A, no fL B, and 
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for all n < no we either have n € A, B or n 0 A, B. Thus 
0 = f(A) — f(B) 

= J 2 l0_ " - 10 ~ n 


neA 


nEB 


Y io _n + io" n ° + y l0_ 

^ n<no:nEA n>riQ:nEA 

Y io_n + Y 10 

\n<no:nEB n>no:nEB 

= io- n °+ y 10 " n - 10_n 

n>no:nEA n>no:nEB 

> io _n ° + o - Y 10 ~ n 

n>no 

> 10 _n ° - -10" n ° 

9 

> 0 , 

a contradiction, where we have used the geometric series lemma (Lemma 
7.3.3) to sum 


LXJ UXJ 1 

Y^ 10 -n = Y^ 10 _( ' no+1+m ' ) = io _no_1 Y / io _m = -10 _no . 

n>no m = 0 m .= 0 

Thus / is injective, which means that /( 2 N ) has the same cardinality as 
2 n and is thus uncountable. Since /( 2 N ) is a subset of R, this forces R 
to be uncountable also (otherwise this would contradict Corollary 8.1.7), 
and we are done. □ 


Remark 8.3.5. We will give another proof of this result using measure 
theory in Exercise 11.42.6. 

Remark 8.3.6. Corollary 8.3.4 shows that the reals have strictly larger 
cardinality than the natural numbers (in the sense of Exercise 3.6.7). 
One could ask whether there exist any sets which have strictly larger 
cardinality than the natural numbers, but strictly smaller cardinality 
than the reals. The Continuum Hypothesis asserts that no such sets 
exist. Interestingly, it was shown in separate works of Kurt Godel (1906— 
1978) and Paul Cohen (1934-2007) that this hypothesis is independent 
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of the other axioms of set theory; it can neither be proved nor disproved 
in that set of axioms (unless those axioms are inconsistent, which is 
highly unlikely). 


— Exercises — 

Exercise 8.3.1. Let X be a finite set of cardinality n. Show that 2 X is a finite 
set of cardinality 2". (Hint: use induction on n.) 

Exercise 8.3.2. Let A, B, C be sets such that A C B C C, and suppose that 
there is a injection / : C A. Define the sets Do, D\, D 2 , . . . recursively by 
setting Do := B\A, and then D n+ 1 := f(D n ) for all natural numbers n. Prove 
that the sets D 0 , D\, . . . are all disjoint from each other (i.e., D. n fl D m = 0 
whenever n m). Also show that if g : A — > B is the function defined by 
setting g{x) := when x € U^Li D n , and g(x) := x when x ^ U^Li D n , 

then g does indeed map A to B and is a bijection between the two. In particular, 
A and B have the same cardinality. 

Exercise 8.3.3. Recall from Exercise 3.6.7 that a set A is said to have lesser or 
equal cardinality than a set B iff there is an injective map / : A — >• B from A 
to B. Using Exercise 8.3.2, show that if A, B are sets such that A has lesser or 
equal cardinality to B and B has lesser or equal cardinality to A, then A and 
B have equal cardinality. (This is known as the Schroder-Bernstein theorem, 
after Ernst Schroder (1841 1902) and Felix Bernstein (1878-1956).) 

Exercise 8.3.4. Let us say that a set A has strictly lesser cardinality than a set 
B if A has lesser than or equal cardinality to B (in the sense of Exercise 3.6.7) 
but A does not have equal cardinality to B. Show that for any set X, that X 
has strictly lesser cardinality than 2 X . Also, show that if A has strictly lesser 
cardinality than B, and B has strictly lesser cardinality than C, then A has 
strictly lesser cardinality than C. 

Exercise 8.3.5. Show that no power set (i.e., a set of the form 2 X for some set 
X) can be countably infinite. 


8.4 The axiom of choice 

We now discuss the final axiom of the standard Zermelo-Fraenkel-Choice 
system of set theory, namely the axiom of choice. We have delayed intro- 
ducing this axiom for a while now, to demonstrate that a large portion of 
the foundations of analysis can be constructed without appealing to this 
axiom. However, in many further developments of the theory, it is very 
convenient (and in some cases even essential) to employ this powerful 
axiom. On the other hand, the axiom of choice can lead to a number 
of unintuitive consequences (for instance the Banach-Tarski paradox, a 
simplified version of which we will encounter in Section 11.43), and can 
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lead to proofs that are philosophically somewhat unsatisfying. Never- 
theless, the axiom is almost universally accepted by mathematicians. 
One reason for this confidence is a theorem due to the great logician 
Kurt Godel, who showed that a result proven using the axiom of choice 
will never contradict a result proven without the axiom of choice (unless 
all the other axioms of set theory are themselves inconsistent, which is 
highly unlikely). More precisely, Godel demonstrated that the axiom 
of choice is undecidable', it can neither be proved nor disproved from 
the other axioms of set theory, so long as those axioms are themselves 
consistent. (From a set of inconsistent axioms one can prove that ev- 
ery statement is both true and false.) In practice, this means that any 
“real-life” application of analysis (more precisely, any application in- 
volving only “decidable” questions) which can be rigorously supported 
using the axiom of choice, can also be rigorously supported without the 
axiom of choice, though in many cases it would take a much more com- 
plicated and lengthier argument to do so if one were not allowed to use 
the axiom of choice. Thus one can view the axiom of choice as a con- 
venient and safe labour-saving device in analysis. In other disciplines 
of mathematics, notably in set theory in which many of the questions 
are not decidable, the issue of whether to accept the axiom of choice is 
more open to debate, and involves some philosophical concerns as well 
as mathematical and logical ones. However, we will not discuss these 
issues in this text. 

We begin by generalizing the notion of finite Cartesian products from 
Definition 3.5.7 to infinite Cartesian products. 

Definition 8.4.1 (Infinite Cartesian products). Let I be a set (possibly 
infinite), and for each a £ I let X a be a set. We then define the Cartesian 
product Uaei X a to be the set 

Xq, = < (x Q ) Q , e / <E (1J X/s) 1 : x a £ X a for all a € I 
a&l [ p&l 

where we recall (from Axiom 3.10) that (U ae /^«)^ i s th e se ^ °f 
functions (x a ) Q& i which assign an element x a € U/?e/ A'g to each a € /. 
Thus Hoe/ X a is a subset of that set of functions, consisting instead of 
those functions (x a ) a& i which assign an element x a € X Q to each a € I. 

Example 8.4.2. For any sets / and X, we have Y\ ae iX = X 1 (why?). 
If I is a set of the form /:={?'€ N : 1 < i < n}, then i s tl ie 

same set as the set rii<i<Ar^* defined in Definition 3.5.7 (why?). 
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Recall from Lemma 3.5.12 that if X \, . . . , X n were any finite collec- 
tion of non-empty sets, then the finite Cartesian product rii<i<n was 
also non-empty. The axiom of choice asserts that this statement is also 
true for infinite Cartesian products: 

Axiom 8.1 (Choice). Let I be a set, and for each a € I, let X a be 
a non-empty set. Then Hoe/ ^ a a ^ so non - em Pty ■ I n other words, 
there exists a function {x a ) a &i which assigns to each a € I an element 
x a £ X a . 

Remark 8.4.3. The intuition behind this axiom is that given a (pos- 
sibly infinite) collection of non-empty sets X a , one should be able to 
choose a single element x a from each one, and then form the possibly 
infinite tuple (x a ) a ei from all the choices one has made. On one hand, 
this is a very intuitively appealing axiom; in some sense one is just apply- 
ing Lemma 3.1.6 over and over again. On the other hand, the fact that 
one is making an infinite number of arbitrary choices, with no explicit 
rule as to how to make these choices, is a little disconcerting. Indeed, 
there are many theorems proven using the axiom of choice which assert 
the abstract existence of some object x with certain properties, with- 
out saying at all what that object is, or how to construct it. Thus the 
axiom of choice can lead to proofs which are non- constructive - demon- 
strating existence of an object without actually constructing the object 
explicitly. This problem is not unique to the axiom of choice - it already 
appears for instance in Lemma 3.1.6 - but the objects shown to exist 
using the axiom of choice tend to be rather extreme in their level of 
non-constructiveness. However, as long as one is aware of the distinc- 
tion between a non-constructive existence statement, and a constructive 
existence statement (with the latter being preferable, but not strictly 
necessary in many cases), there is no difficulty here, except perhaps on 
a philosophical level. 

Remark 8.4.4. There are many equivalent formulations of the axiom 
of choice; we give some of these in the exercises below. 

In analysis one often does not need the full power of the axiom of 
choice. Instead, one often only needs the axiom of countable choice , 
which is the same as the axiom of choice but with the index set I re- 
stricted to be at most countable. We give a typical example of this 
below. 
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Lemma 8.4.5. Let E be a non-empty subset of the real line with 
sup(.E') < oo (i.e., E is bounded from above). Then there exists a se- 
quence (a n )^ =1 whose elements a n all lie in E, such that lim n _ 5 . 0O a n = 
sup (E). 

Proof. For each positive natural number n, let X n denote the set 

X n := {x € E : sup (E) — 1/n < x < sup(E)}. 

Since sup (FI) is the least upper bound for E, then sup(F?) — 1/n cannot 
be an upper bound for E. and so X n is non-empty for each n. Using the 
axiom of choice (or the axiom of countable choice), we can then find a 
sequence (a n )ff =l such that a n £ X n for all n > 1. In particular a n € E 
for all n, and sup(E) — 1/n < a n < sup (E) for all n. But then we have 
lim n _ ) . 00 a n = sup (E) by the squeeze test (Corollary 6.4.14). □ 

Remark 8.4.6. In many special cases, one can obtain the conclusion 
of this lemma without using the axiom of choice. For instance, if E is 
a closed set (Definition 11.4.12) then one can define a n without choice 
by the formula a n := inf(X n ); the extra hypothesis that E is closed will 
ensure that a n lies in E. 

Another formulation of the axiom of choice is as follows. 

Proposition 8.4.7. Let X and Y be sets, and let P(x,y) be a property 
pertaining to an object x € X and an object y G Y such that for every 
x £ X there is at least one y EY such that P(x,y) is true. Then there 
exists a function f : X Y such that P(x, f(x)) is true for all x £ X . 

Proof. See Exercise 8.4.1. □ 


— Exercises — 

Exercise 8.4.1. Show that the axiom of choice implies Proposition 8.4.7. (Hint: 
consider the sets Y x := {y £ Y : P(x,y) is true} for each x £ X.) Conversely, 
show that if Proposition 8.4.7 is true, then the axiom of choice is also true. 

Exercise 8.4.2. Let I be a set, and for each a £ I let X a be a non-empty set. 
Suppose that all the sets X a are disjoint from each other, i.e., X a D Xp = 0 
for all distinct a, /? £ I. Using the axiom of choice, show that there exists a set 
Y such that #(U fi X a ) = 1 for all a £ I (i.e., Y intersects each X a in exactly 
one element). Conversely, show that if the above statement was true for an 
arbitrary choice of sets / and non-empty disjoint sets X a , then the axiom of 
choice is true. (Hint: the problem is that in Axiom 8.1 the sets X a are not 
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assumed to be disjoint. But this can be fixed by the trick by looking at the 
sets {a} x X a = {(a, a:) : x £ X a } instead.) 

Exercise 8.4.3. Let A and B be sets such that there exists a surjection g : 
B — >■ A. Using the axiom of choice, show that there then exists an injection 
/ : A — >■ B; in other words A has lesser or equal cardinality to B in the sense 
of Exercise 3.6.7. (Hint: consider the inverse images 3 _1 ({a}) for each a £ A.) 
Compare this with Exercise 3.6.8. Conversely, show that if the above statement 
is true for arbitrary sets A, B and surjections g : B — ► A, then the axiom of 
choice is true. (Hint: use Exercise 8.4.2.) 


8.5 Ordered sets 

The axiom of choice is intimately connected to the theory of ordered sets. 
There are actually many types of ordered sets; we will concern ourselves 
with three such types, the partially ordered sets , the totally ordered sets, 
and the well-ordered sets. 

Definition 8.5.1 (Partially ordered sets). A partially ordered set (or 
poset) is a set X, together 1 with a relation <x on X (thus for any two 
objects x,y £ X, the statement x <x V is either a true statement or 
a false statement). Furthermore, this relation is assumed to obey the 
following three properties: 

• (Reflexivity) For any x € X, we have x <x x. 

• (Anti-symmetry) If x, y £ X are such that x <x y and y <x x, 
then x = y. 

• (Transitivity) If x, y, z € X are such that x <x y and y <x z, 
then x <x z. 

We refer to <x as the ordering relation. In most situations it is under- 
stood what the set X is from context, and in those cases we shall simply 
write < instead of <x- We write x <x U (or x < y for short) if x <x 1J 
and x j - y. 

Examples 8.5.2. The natural numbers N together with the usual less- 
than-or-equal-to relation < (as defined in Definition 2.2.11) forms a par- 
tially ordered set, by Proposition 2.2.12. Similar arguments (using the 


1 Strictly speaking, a partially ordered set is not a set X, but rather a pair (A', <x). 
But in many cases the ordering <x will be clear from context, and so we shall refer 
to X itself as the partially ordered set even though this is technically incorrect. 
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appropriate definitions and propositions) show that the integers Z, the 
rationals Q, the reals R, and the extended reals R* are also partially 
ordered sets. Meanwhile, if A is any collection of sets, and one uses the 
relation of is-a-subset-of C (as defined in Definition 3.1.15) for the order- 
ing relation <x, then X is also partially ordered (Proposition 3.1.18). 
Note that it is certainly possible to give these sets a different partial 
ordering than the standard one; see for instance Exercise 8.5.3. 

Definition 8.5.3 (Totally ordered set). Let X be a partially ordered 
set with some order relation <x- A subset Y of X is said to be totally 
ordered if, given any two y. y' £ Y, we either have y <x y' or y' <x y 
(or both). If X itself is totally ordered, we say that A is a totally ordered 
set (or chain ) with order relation <x- 

Examples 8.5.4. The natural numbers N, the integers Z, the ratio- 
nals Q, reals R, and the extended reals R*, all with the usual or- 
dering relation <, are totally ordered (by Proposition 2.2.13, Lemma 
4.1.11, Proposition 4.2.9, Proposition 5.4.7, and Proposition 6.2.5 re- 
spectively). Also, any subset of a totally ordered set is again totally 
ordered (why?). On the other hand, a collection of sets with the C 
relation is usually not totally ordered. For instance, if X is the set 
{{1, 2}, {2}, {2, 3}, {2, 3, 4}, {5}}, ordered by the set inclusion relation 
C, then the elements {1,2} and {2, 3} of X are not comparable to each 
other (i.e., {1,2} % {2,3} and {2,3} % {1,2}). 

Definition 8.5.5 (Maximal and minimal elements). Let A be a par- 
tially ordered set, and let Y be a subset of A. We say that y is a minimal 
element of Y if y £ Y and there is no element y' £ Y such that y' < y. 
We say that y is a maximal element of Y if y £ Y and there is no 
element y' £ Y such that y < y' . 

Example 8.5.6. Using the set A from the previous example, {2} is a 
minimal element, {1,2} and {2,3,4} are maximal elements, {5} is both 
a minimal and a maximal element, and {2,3} is neither a minimal nor 
a maximal element. This example shows that a partially ordered set 
can have multiple maxima and minima; however, a totally ordered set 
cannot (Exercise 8.5.7). 

Example 8.5.7. The natural numbers N (ordered by <) has a minimal 
element, namely 0, but no maximal element. The set of integers Z has 
no maximal and no minimal element. 
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Definition 8.5.8 (Well-ordered sets). Let X be a partially ordered set, 
and let Y be a totally ordered subset of X. We say that Y is well-ordered 
if every non-empty subset of Y has a minimal element min(Y). 

Examples 8.5.9. The natural numbers N are well-ordered by Propo- 
sition 8.1.4. However, the integers Z, the rationals Q, and the real 
numbers R are not (see Exercise 8.1.2). Every finite totally ordered set 
is well-ordered (Exercise 8.5.8). Every subset of a well-ordered set is 
again well-ordered (why?). 

One advantage of well-ordered sets is that they automatically obey 
a principle of strong induction (cf. Proposition 2.2.14): 

Proposition 8.5.10 (Principle of strong induction). Let X be a well- 
ordered set with an ordering relation <, and let P(n) be a property per- 
taining to an element n £ X (i.e., for each n £ X, P(n) is either a true 
statement or a false statement). Suppose that for every n £ X, we have 
the following implication: if P(m) is true for all m € X with m <x n, 
then P(n ) is also true. Prove that P(n) is true for all n £ X . 

Remark 8.5.11. It may seem strange that there is no “base” case 
in strong induction, corresponding to the hypothesis P(0) in Axiom 
2.5. However, such a base case is automatically included in the strong 
induction hypothesis. Indeed, if 0 is the minimal element of X, then by 
specializing the hypothesis “if P(m) is true for all m £ X with m <x n, 
then P(n) is also true” to the n = 0 case, we automatically obtain that 
P(0) is true. (Why?) 

Proof. See Exercise 8.5.10. □ 

So far we have not seen the axiom of choice play any role. This will 
come in once we introduce the notion of an upper bound and a strict 
upper bound. 

Definition 8.5.12 (Upper bounds and strict upper bounds). Let X be 
a partially ordered set with ordering relation <, and let Y be a subset 
of A. If x £ X, we say that x is an upper bound for Y iff y < x for all 
y £ Y. If in addition x (f Y, we say that x is a strict upper bound for 
Y. Equivalently, x is a strict upper bound for Y iff y < x for all y £ Y . 
(Why is this equivalent?) 

Example 8.5.13. Let us work in the real number system R with the 
usual ordering <. Then 2 is an upper bound for the set {x £ R : 1 < 
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x < 2} but is not a strict upper bound. The number 3, on the other 
hand, is a strict upper bound for this set. 

Lemma 8.5.14. Let X be a partially ordered set with ordering relation 
<, and let x$ be an element of X. Then there is a well-ordered subset 

Y of X which has xo as its minimal element, and which has no strict 
upper bound. 

Proof. The intuition behind this lemma is that one is trying to perform 
the following algorithm: we initalize Y := {xo}. If Y has no strict upper 
bound, then we are done; otherwise, we choose a strict upper bound and 
add it to Y. Then we look again to see if Y has a strict upper bound 
or not. If not, we are done; otherwise we choose another strict upper 
bound and add it to Y . We continue this algorithm “infinitely often” 
until we exhaust all the strict upper bounds; the axiom of choice comes 
in because infinitely many choices are involved. This is however not a 
rigorous proof because it is quite difficult to precisely pin down what it 
means to perform an algorithm “infinitely often” . Instead, what we will 
do is that we will isolate a collection of “partially completed” sets Y, 
which we shall call good sets , and then take the union of all these good 
sets to obtain a “completed” object Y 0 0 which will indeed have no strict 
upper bound. 

We now begin the rigorous proof. Suppose for sake of contradiction 
that every well-ordered subset Y of X which has xo as its minimal 
element has at least one strict upper bound. Using the axiom of choice 
(in the form of Proposition 8.4.7), we can thus assign a strict upper 
bound s(Y) € X to each well-ordered subset Y of X which has xo as its 
minimal element. 

Let us define a special class of subsets Y of X. We say that a subset 

Y of X is good iff it is well-ordered, contains xo as its minimal element, 
and obeys the property that 

x = -s({2/ € Y : y < x}) for all x € T\{xo}. 

Note that if x € T\{xo} then the set {y G Y : y < x} is a subset of 
X which is well-ordered and contains xq as its minimal element. Let 
LI := {Y C X : Y is good} be the collection of all good subsets of X. 
This collection is not empty, since the subset {xo} of X is clearly good 
(why?). 

We make the following important observation: if Y and Y' are two 
good subsets of X, then every element of Y'\Y is a strict upper bound for 
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T, and every element of Y\Y' is a strict upper bound for Y' . (Exercise 
8.5.13). In particular, given any two good sets Y and Y' , at least one of 
Y'\Y and Y\Y' must be empty (since they are both strict upper bounds 
of each other). In other words, Ll is totally ordered by set inclusion: given 
any two good sets Y and Y 1 . either Y C Y’ or Y' C Y . 

Let Too := 1J 0, i.e. , Too is the set of all elements of X which belong 
to at least one good subset of X. Clearly xq € Y 0 c . Also, since each 
good subset of X has xo as its minimal element, the set Y 0 0 also has xo 
as its minimal element (why?). 

Next, we show that Y 0 0 is totally ordered. Let x, x 1 be two elements 
of Too . By definition of Too, we know that x lies in some good set Y 
and x' lies in some good set Y' . But since fl is totally ordered, one of 
these good sets contains the other. Thus x, x' are contained in a single 
good set (either Y or T'); since good sets are totally ordered, we thus 
see that either x < x' or x' < x as desired. 

Next, we show that Too is well-ordered. Let A be any non-empty 
subset of Too- Then we can pick an element a € A, which then lies in 
Too. Therefore there is a good set Y such that a£Y. Then A fl Y is a 
non-empty subset of Y ; since Y is well-ordered, the set A n Y thus has 
a minimal element, call it b. Now recall that for any other good set Y ' , 
every element of Y'\Y is a strict upper bound for T, and in particular is 
larger than b. Since b is a minimal element of A n T, this implies that b 
is also a minimal element of A n Y' for any good set Y' with dnE ^ 0 
(why?). Since every element of A belongs to Too and hence belongs to 
at least one good set Y', we thus see that b is a minimal element of A. 
Thus Too is well-ordered as claimed. 

Since Too is well-ordered with xq as its minimal element, it has a 
strict upper bound .s(Too). But then TooU{s(Too)} is well-ordered (why? 
see Exercise 8.5.11) and has xq as its minimal element (why?). Thus 
this set is good, and must therefore be contained in Too- But this is a 
contradiction since s(Too) is a strict upper bound for T»- Thus we have 
constructed a set with no strict upper bound, as desired. □ 

The above lemma has the following important consequence: 

Lemma 8.5.15 (Zorn’s lemma). Let X be a non-empty partially ordered 
set, with the property that every totally ordered subset Y of X has an 
upper bound. Then X contains at least one maximal element. 

Proof. See Exercise 8.5.14. □ 
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We give some applications of Zorn’s lemma (also called the principle 
of transfinite induction ) in the exercises below. 

— Exercises — 

Exercise 8.5.1. Consider the empty set 0 with the empty order relation <0 
(this relation is vacuous because the empty set has no elements). Is this set 
partially ordered? totally ordered? well-ordered? Explain. 

Exercise 8.5.2. Give examples of a set X and a relation < such that 

(a) The relation < is reflexive and anti-symmetric, but not transitive; 

(b) The relation < is reflexive and transitive, but not anti-symmetric; 

(c) The relation < is anti-symmetric and transitive, but not reflexive. 

Exercise 8.5.3. Given two positive integers n, m £ N\{0}, we say that n divides 
m, and write n|m, if there exists a positive integer a such that m = na. Show 
that the set N\{0} with the ordering relation | is a partially ordered set but 
not a totally ordered one. Note that this is a different ordering relation from 
the usual < ordering of N\{0}. 

Exercise 8.5.4. Show that the set of positive reals R + := (x £ R : x > 0} have 
no minimal element. 

Exercise 8.5.5. Let / : X — > Y be a function from one set X to another set Y. 
Suppose that Y is partially ordered with some ordering relation <y. Define a 
relation <x on X by defining x <x x' if and only if /( x) <y fix') or x = x' . 
Show that this relation <x turns X into a partially ordered set. If we know in 
addition that the relation <y makes Y totally ordered, does this mean that the 
relation <x makes X totally ordered also? If not, what additional assumption 
needs to be made on / in order to ensure that <x makes X totally ordered? 

Exercise 8.5.6. Let X be a partially ordered set. For any x in X , define the 
order ideal (x) C X to be the set (x) := {y £ X : y < x}. Let (X) := {(x) : x £ 
X} be the set of all order ideals, and let / : X — > ( X ) be the map /(x) := (x) 
that sends every element of x to its order ideal. Show that / is a Injection, and 
that given any x,y £ X, that x <x y if and only if /(x) C f{y). This exercise 
shows that any partially ordered set can be represented by a collection of sets 
whose ordering relation is given by set inclusion. 

Exercise 8.5.7. Let X be a partially ordered set, and let Y be a totally ordered 
subset of X. Show that Y can have at most one maximum and at most one 
minimum. 

Exercise 8.5.8. Show that every finite non-empty subset of a totally ordered set 
has a minimum and a maximum. (Hint: use induction.) Conclude in particular 
that every finite totally ordered set is well-ordered. 
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Exercise 8.5.9. Let X be a totally ordered set such that every non-empty subset 
of X has both a minimum and a maximum. Show that X is finite. (Hint: 
assume for sake of contradiction that X is infinite. Start with the minimal 
element Xq of X and then construct an increasing sequence Xo < Xi < ... in 

X.) 

Exercise 8.5.10. Prove Proposition 8.5.10, without using the axiom of choice. 
(Hint: consider the set 

Y := {n £ X : P(m ) is false for some m £ X with to <x n}, 

and show that Y being non-empty would lead to a contradiction.) 

Exercise 8.5.11. Let X be a partially ordered set, and let Y and Y' be well- 
ordered subsets of X. Show that YUY' is well-ordered if and only if it is totally 
ordered. 

Exercise 8.5.12. Let X and Y be partially ordered sets with ordering relations 
<x and <y respectively. Define a relation <xxy on the Cartesian product 
X x Y by defining (x,y) <xxy {x' ,y') if x <x x 1 , or if x = x’ and y <y y' ■ 
(This is called the lexicographical ordering on X x Y, and is similar to the 
alphabetical ordering of words; a word w appears earlier in a dictionary than 
another word w' if the first letter of w is earlier in the alphabet than the first 
letter of w', or if the first letters match and the second letter of w is earlier 
than the second letter of «/, and so forth.) Show that <xxy defines a partial 
ordering on X x Y. Furthermore, show that if X and Y are totally ordered, 
then so is X x Y, and if X and Y are well-ordered, then so is X x Y . 

Exercise 8.5.13. Prove the claim in the proof of Lemma 8.5.14, namely that 
every element of Y'\Y is an upper bound for Y and vice versa. (Hint: Show 
using Proposition 8.5.10 that 

{y e Y : y < a} = {y G Y' : y < a] = {y G Y n Y' : y < a} 

for all a € Y D Y'. Conclude that Y fl Y' is good, and hence s(Y D Y') exists. 
Show that s(Y fl Y') = min(Y'\Y) if Y'\Y is non-empty, and similarly with 
Y and Y' interchanged. Since Y'\Y and Y\Y' are disjoint, one can then 
conclude that one of these sets is empty, at which point the claim becomes 
easy to establish.) 

Exercise 8.5.14. Use Lemma 8.5.14 to prove Lemma 8.5.15. (Hint: first show 
that if X had no maximal elements, then any subset of X which has an upper 
bound, also has a strict upper bound.) 

Exercise 8.5.15. Let A and B be two non-empty sets such that A does not have 
lesser or equal cardinality to B. Using the principle of transfinite induction, 
prove that B has lesser or equal cardinality to A. (Hint: for every subset 
X C B, let P{X) denote the property that there exists an injective map from X 
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to A.) This exercise (combined with Exercise 8.3.3) shows that the cardinality 
of any two sets is comparable, as long as one assumes the axiom of choice. 

Exercise 8.5.16. Let X be a set, and let P be the set of all partial orderings 
of X. (For instance, if X := N\{0}, then both the usual partial ordering <, 
and the partial ordering in Exercise 8.5.3, are elements of P.) We say that one 
partial ordering <£ P is coarser than another partial ordering <'£ P if for any 
x,y £ P, we have the implication (x < y) => (x <’ y). Thus for instance 
the partial ordering in Exercise 8.5.3 is coarser than the usual ordering <. Let 
us write <^< r if < is coarser than <’ . Show that A turns P into a partially 
ordered set; thus the set of partial orderings on X is itself partially ordered. 
There is exactly one minimal element of P; what is it? Show that the maximal 
elements of P are precisely the total orderings of P. Using Zorn’s lemma, show 
that given any partial ordering < of X there exists a total ordering <' such 
that < is coarser than <'. 

Exercise 8.5.17. Use Zorn’s lemma to give another proof of the claim in Exercise 
8.4.2. (Hint: let fl be the set of all Y C |J QgJ X Q such that #(Y nl a ) < 1 
for all a £ I, i.e. , all sets which intersect each X a in at most one element. 
Use Zorn’s lemma to locate a maximal element of U.) Deduce that Zorn’s 
lemma and the axiom of choice are in fact logically equivalent (i.e., they can 
be deduced from each other). 

Exercise 8.5.18. Using Zorn’s lennna, prove Hausdorff’s maximality principle : 
if X is a partially ordered set, then there exists a totally ordered subset Y of 
X which is maximal with respect to set inclusion (i.e. there is no other totally 
ordered subset Y’ of X which contains Y. Conversely, show that if Hausdorff’s 
maximality principle is true, then Zorn’s lemma is true. Tthus by Exercise 
8.5.17, these two statements are logically equivalent to the axiom of choice. 

Exercise 8.5.19. Let X be a set, and let fl be the space of all pairs (Y, <), 
where Y is a subset of X and < is a well-ordering of Y . If (Y, <) and (Y', <') 
are elements of fl, we say that (Y, <) is an initial segment of (Y' , <’) if there 
exists an x £ Y' such that Y := {y £ Y' : y <’ x} (so in particular Y C Y'), 
and for any y, y' £ Y, y < y' if and only if y <' y' . Define a relation A on £1 by 
defining (Y, <) A (Y',<') if either (Y, <) = (Y',<'), or if (Y, <) is an initial 
segment of (Y',<'). Show that ^ is a partial ordering of fl. There is exactly 
one minimal element of fl; what is it? Show that the maximal elements of fl 
are precisely the well-orderings (X, <) of X. Using Zorn’s lemma, conclude the 
well ordering principle: every set X has at least one well-ordering. Conversely, 
use the well-ordering principle to prove the axiom of choice, Axiom 8.1. (Hint: 
place a well-ordering < on X a . and then consider the minimal elements 
of each X a .) We thus see that the axiom of choice, Zorn’s lemma, and the 
well-ordering principle are all logically equivalent to each other. 

Exercise 8.5.20. Let X be a set, and let fl c 2 A be a collection of subsets of 
X. Assume that fl does not contain the empty set 0. Using Zorn’s lemma, 
show that there is a subcollection fl' C fl such that all the elements of fl' are 
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disjoint from each other (i.e., A fl B = 0 whenever A, B are distinct elements 
of fl'), but that all the elements of fl intersect at least one element of f Y (i.e., 
for all C £ fl there exists A £ fl' such that C fl A ^ 0). (Hint: consider all 
the subsets of f l whose elements are all disjoint from each other, and locate 
a maximal element of this collection.) Conversely, if the above claim is true, 
show that it implies the claim in Exercise 8.4.2, and thus this is yet another 
claim which is logically equivalent to the axiom of choice. (Hint: let fl be the 
set of all pair sets of the form {(0, a), (1, x a )}, where a £ I and x a £ X a .) 



Chapter 9 


Continuous functions on R 


In previous chapters we have been focusing primarily on sequences. A 
sequence (a n )^h 0 can be viewed as a function from N to R, i.e., an 
object which assigns a real number a n to each natural number n. We 
then did various things with these functions from N to R, such as take 
their limit at infinity (if the function was convergent), or form suprema, 
infima, etc., or computed the sum of all the elements in the sequence 
(again, assuming the series was convergent). 

Now we will look at functions not on the natural numbers N, which 
are “discrete”, but instead look at functions on a continuum 1 such as 
the real line R, or perhaps on an interval such as {x € R : a < x < b}. 
Eventually we will perform a number of operations on these functions, 
including taking limits, computing derivatives, and evaluating integrals. 
In this chapter we will focus primarily on limits of functions, and on the 
closely related concept of a continuous function. 

Before we discuss functions, though, we must first set out some no- 
tation for subsets of the real line. 


9.1 Subsets of the real line 

Very often in analysis we do not work on the whole real line R, but 
on certain subsets of the real line, such as the positive real axis {x € 
R : x > 0}. Also, we occasionally work with the extended real line R* 
defined in Section 6.2, or in subsets of that extended real line. 


1 We will not rigorously define the notion of a discrete set or a continuum in this 
text, but roughly speaking a set is discrete if each element is separated from the rest 
of the set by some non-zero distance, whereas a set is a continuum if it is connected 
and contains no “holes”. 
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There are of course infinitely many subsets of the real line; indeed, 
Cantor’s theorem (Theorem 8.3.1; see also Exercise 8.3.4) shows that 
there are even more such sets than there are real numbers. However, 
there are certain special subsets of the real line (and the extended real 
line) which arise quite often. One such family of sets are the intervals. 

Definition 9.1.1 (Intervals). Let a,b € R* be extended real numbers. 
We define the closed interval [a, b] by 

[a, 6] := {x € R* : a < x < b}, 
the half-open intervals [a, b) and (a, b] by 

[a, b) := {x € R* : a < x < &}; (a, b] := {x G R* : a < x < b}, 

and the open intervals (a, b ) by 

(a, b) := {x € R* : a < x < b}. 

We call a the left endpoint of these intervals, and b the right endpoint. 

Remark 9.1.2. Once again, we are overloading the parenthesis no- 
tation; for instance, we are now using (2, 3) to denote both an open 
interval from 2 to 3, as well as an ordered pair in the Cartesian plane 
R 2 := R x R. This can cause some genuine ambiguity, but the reader 
should still be able to resolve which meaning of the parentheses is in- 
tended from context. In some texts, this issue is resolved by using re- 
versed brackets instead of parenthesis, thus for instance [a, b) would now 
be [a, f>[, (a, b\ would be ]a, b], and (a, b ) would be ]a, b[. 

Examples 9.1.3. If a and b are real numbers (i.e., not equal to +oo 
or — oo) then all of the above intervals are subsets of the real line, for 
instance [2, 3) = {x € R : 2 < x < 3}. The positive real axis { x € 
R : x > 0} is the open interval (0, Too), while the non-negative real 
axis {x € R : x > 0} is the half-open interval [0, +oo). Similarly, the 
negative real axis {x € R : x < 0} is (— oo,0), and the non-positive real 
axis {x € R : x < 0} is (— oo, 0]. Finally, the real line R itself is the open 
interval (— oo, +oo), while the extended real line R* is the closed interval 
[— oo, Too] . We sometimes refer to an interval in which one endpoint is 
infinite (either Too or — oo) as half-infinite intervals, and intervals in 
which both endpoints are infinite as doubly-infinite intervals; all other 
intervals are bounded intervals. Thus [2, 3) is a bounded interval, the 
positive and negative real axes are half-infinite intervals, and R and R* 
are infinite intervals. 
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Example 9.1.4. If a > b then all four of the intervals [a, b\, [a, b), (a, b], 
and (a, b) are the empty set (why?). If a = b, then the three intervals 
[a, b), (a, b ], and (a, b) are the empty set, while [a, b] is just the singleton 
set {a} (why?). Because of this, we call these intervals degenerate ; most 
(but not all) of our analysis will be restricted to non-degenerate intervals. 

Of course intervals are not the only interesting subsets of the real 
line. Other important examples include the natural numbers N, the 
integers Z, and the rationals Q. One can form additional sets using 
such operations as union and intersection (see Section 3.1), for instance 
one could have a disconnected union of two intervals such as (1, 2)U [3, 4], 
or one could consider the set [— 1, 1] 0 Q of rational numbers between 
— 1 and 1 inclusive. Clearly there are infinitely many possibilities of sets 
one could create by such operations. 

Just as sequences of real numbers have limit points, sets of real 
numbers have adherent points , which we now define. 

Definition 9.1.5 (e-adherent points). Let A be a subset of R, let e > 0, 
and let i£R, We say that x is e-adherent to X iff there exists a y & X 
which is e-close to x (i.e., \x — y\ < e). 

Remark 9.1.6. The terminology “e-adherent” is not standard in the 
literature. However, we shall shortly use it to define the notion of an 
adherent point, which is standard. 

Example 9.1.7. The point 1.1 is 0.5-adherent to the open interval 
(0,1), but is not 0.1-adherent to this interval (why?). The point 1.1 is 
0.5-adherent to the finite set {1,2,3}. The point 1 is 0.5-adherent to 
{1,2,3} (why?). 

Definition 9.1.8 (Adherent points). Let A be a subset of R, and let 
x G R. We say that x is an adherent point of X iff it is e-adherent to 
X for every e > 0. 

Example 9.1.9. The number 1 is e-adherent to the open interval (0, 1) 
for every e > 0 (why?), and is thus an adherent point of (0,1). The 
point 0.5 is similarly an adherent point of (0, 1). However, the number 
2 is not 0.5-adherent (for instance) to (0, 1), and is thus not an adherent 
point to (0, 1). 

Definition 9.1.10 (Closure). Let A be a subset of R. The closure of 
A, sometimes denoted A is defined to be the set of all the adherent 
points of A. 
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Lemma 9.1.11 (Elementary properties of closures). Let X and Y be 
arbitrary subsets of R. Then X C X, X UY = 1111, and IfllC 
InF. If X C Y, then X C F. 

Proof. See Exercise 9.1.2. □ 

We now compute some closures. 

Lemma 9.1.12 (Closures of intervals). Let a < b be real numbers, and 
let I be any one of the four intervals ( a,b ), (a, b], [a, b), or [ a,b ]. Then 
the closure of I is [ a,b ]. Similarly, the closure of (a, oo) or [a,oo) is 
[a, oo), while the closure of {— oo,a) or (— oo,a] is (— oo,a]. Finally, the 
closure of (— 00 , 00 ) is (— 00 , 00 ). 

Proof. We will just show one of these facts, namely that the closure 
of (a, b) is [a, 6]; the other results are proven similarly (or one can use 
Exercise 9.1.1). 

First let us show that every element of [a, b] is adherent to (a, b). Let 
x € [a, b\. If x € (a, b) then it is definitely adherent to (a, b). If x = b 
then x is also adherent to (a, b) (why?). Similarly when x = a. Thus 
every point in [a, b] is adherent to (a, b). 

Now we show that every point x that is adherent to (a, b) lies in [a, b\. 
Suppose for sake of contradiction that x does not lie in [a, b], then either 
x > b or x < a. If x > b then x is not (x — 6)-adherent to (a, b) (why?), 
and is hence not an adherent point to (a, b). Similarly, if x < a, then x 
is not (a — x)-adherent to (a — b), and is hence not an adherent point to 
(a, b). This contradiction shows that x is in fact in [a, b\ as claimed. □ 

Lemma 9.1.13. The closure of N is N. The closure of Z is Z. The 
closure of Q is R, and the closure of R is R. The closure of the empty 
set 0 is 0. 

Proof. See Exercise 9.1.3. □ 

The following lemma shows that adherent points of a set X can be 
obtained as the limit of elements in X: 

Lemma 9.1.14. Let X be a subset of R, and let x G R. Then x is 
an adherent point of X if and only if there exists a sequence (a n )%° =0 , 
consisting entirely of elements in X , which converges to x. 


Proof. See Exercise 9.1.5. 


□ 
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Definition 9.1.15. A subset E C R is said to be closed if E = E, or 
in other words that E contains all of its adherent points. 

Examples 9.1.16. From Lemma 9.1.12 we see that if a < b are real 
numbers, then [a, b], [a, +oo), (— oo,a], and (— oo,+oo) are closed, while 
(a, 6), (a, b\, [ a,b ), (a, +oo), and (— oo , a) are not. From Lemma 9.1.13 
we see that N, Z, R, 0 are closed, while Q is not. 

From Lemma 9.1.14 we can define closure in terms of sequences: 

Corollary 9.1.17. Let X be a subset of R. If X is closed, and (a n )ff =0 
is a convergent sequence consisting of elements in X, then lim, woo a n 
also lies in X. Conversely, if it is true that every convergent sequence 
(a n )tf=o of elements in X has its limit in X as well, then X is necessarily 
closed. 

When we study differentiation in the next chapter, we shall need to 
replace the concept of an adherent point by the closely related notion of 
a limit point. 

Definition 9.1.18 (Limit points). Let A be a subset of the real line. 
We say that x is a limit point (or a cluster point ) of X iff it is an adherent 
point of A\{.r}. We say that x is an isolated point of X if x € X and 
there exists some e > 0 such that \x — y\> e for all y £ X\{x}. 

Example 9.1.19. Let X be the set X = (1,2) U {3}. Then 3 is an 
adherent point of X, but it is not a limit point of X, since 3 is not 
adherent to X — {3} = (1,2); instead, 3 is an isolated point of X. On 
the other hand, 2 is still a limit point of X, since 2 is adherent to 
X — {2} = X; but it is not isolated (why?). 

Remark 9.1.20. From Lemma 9.1.14 we see that x is a limit point of X 
iff there exists a sequence (a n )ff =0 , consisting entirely of elements in X 
that are distinct from x, and such that (a n )ff =0 converges to x. It turns 
out that the set of adherent points splits into the set of limit points and 
the set of isolated points (Exercise 9.1.9). 

Lemma 9.1.21. Let I be an interval ( possibly infinite), i.e., I is a 
set of the form ( a,b ), ( a,b ], [ a,b ), [ a,b\ , (a, +oo), [a, +oo), (— oo ,a), or 
(— oo,a], with a < b in the first four cases. Then every element of I is 
a limit point of I . 

Proof. We show this for the case I = [a, 6]; the other cases are similar 
and are left to the reader. Let x € /; we have to show that x is a limit 
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point of I. There are three cases: x = a, a < x < b, and x = b. If x = a, 
then consider the sequence (x + This sequence converges to x, 

and will lie inside I — {a} = (a, b] if N is chosen large enough (why?). 
Thus by Remark 9.1.20 we see that x = a is a limit point of [a, b\. A 
similar argument works when a < x < b. When x = b one has to use 
the sequence (x — ^)JJLjv instead (why?) but the argument is otherwise 
the same. □ 

Next, we define the concept of a bounded set. 

Definition 9.1.22 (Bounded sets). A subset X of the real line is said 
to be bounded if we have X C [— M, M] for some real number M > 0. 

Example 9.1.23. For any real numbers a, b, the interval [a, b] is 
bounded, because it is contained inside [— M, M], where M := 
max(|o|, |6|). However, the half-infinite interval [0, +oo) is unbounded 
(why?). In fact, no half- infinite interval or doubly infinite interval can 
be bounded. The sets N, Z, Q, and R are all unbounded (why?). 

A basic property of closed and bounded sets is the following. 

Theorem 9.1.24 (Heine-Borel theorem for the line). Let X be a subset 
of R. Then the following two statements are equivalent: 

(a) X is closed and bounded. 

( b ) Given any sequence {a n )ff =Q of real numbers which takes values in 
X ( i.e a n € X for all n), there exists a subsequence (a nj )T L 0 of 
the original sequence, which converges to some number L in X . 

Proof. See Exercise 9.1.13. □ 

Remark 9.1.25. This theorem shall play a key role in subsequent sec- 
tions of this chapter. In the language of metric space topology, it asserts 
that every subset of the real line which is closed and bounded, is also 
compact; see Section 11.7. A more general version of this theorem, 
due to Eduard Heine (1821-1881) and Emile Borel (1871-1956), can be 
found in Theorem 11.7.7. 


— Exercises — 


Exercise 9.1.1. Let X be any subset of the real line, and let Y be a set such 
that X CY C X. Show that Y = X. 


Exercise 9.1.2. Prove Lemma 9.1.11. 
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Exercise 9.1.3. Prove Lemma 9.1.13. (Hint: for computing the closure of Q, 
you will need Proposition 5.4.14.) 

Exercise 9.1.4. Give an example of two subsets X, Y of the real line such that 

iny/InF. 

Exercise 9.1.5. Prove Lemma 9.1.14. (Hint: in order to prove one of the two 
implications here you will need axiom of choice, as in Lemma 8.4.5.) 

Exercise 9.1.6. Let X be a subset of R. Show that X is closed (i.e., X = X). 
Furthermore, show that if Y is any closed set that contains X, then Y also 
contains X. Thus the closure X of X is the smallest closed set which contains 
X. 

Exercise 9.1.7. Let n > 1 be a positive integer, and let X\, . . . ,X n be closed 
subsets of R. Show that X\ U X 2 U . . . U X n is also closed. 

Exercise 9.1.8. Let I be a set (possibly infinite), and for each a £ I let X a be 
a closed subset of R. Show that the intersection P| ag7 X a (defined in (3.3)) is 
also closed. 

Exercise 9.1.9. Let X be a subset of the real line, and i be a real number. 
Show that every adherent point of X is either a limit point or an isolated point 
of X, but cannot be both. Conversely, show that every limit point and every 
isolated point of X is an adherent point of X. 

Exercise 9.1.10. If X is a non-empty subset of R, show that X is bounded if 
and only if inf(X) and sup(X) are finite. 

Exercise 9.1.11. Show that if X is a bounded subset of R, then the closure X 
is also bounded. 

Exercise 9.1.12. Show that the union of any finite collection of bounded subsets 
of R is still a bounded set. Is this conclusion still true if one takes an infinite 
collection of bounded subsets of R? 

Exercise 9.1.13. Prove Theorem 9.1.24. (Hint: to show (a) implies (b), use the 
Bolzano- Weierstrass theorem (Theorem 6.6.8) and Corollary 9.1.17. To show 
(b) implies (a), argue by contradiction, using Corollary 9.1.17 to establish that 
X is closed. You will need the axiom of choice to show that X is bounded, as 
in Lemma 8.4.5.) 

Exercise 9.1.14. Show that any finite subset of R is closed and bounded. 

Exercise 9.1.15. Let E be a bounded subset of R, and let S := sup (E) be the 
least upper bound of E. (Note from the least upper bound principle, Theorem 
5.5.9, that S is a real number.) Show that S is an adherent point of E, and is 
also an adherent point of R\E. 

9.2 The algebra of real- valued functions 

You are familiar with many functions / : R — >• R from the real line to the 
real line. Some examples are: f(x) := x 2 + 3x + 5; f(x) := 2 X / ( x 2 + 1); 
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f{x) := sin(x) exp(x) (we will define sin and exp formally in Chapter 
11.20). These are functions from R to R since to every real number x 
they assign a single real number /(x). We can also consider more exotic 
functions, e.g. 

T{ ) ' \0 ifx^Q. 

This function is not algebraic (i.e., it cannot be expressed in terms of 
x purely by using the standard algebraic operations of , x, /, ^/, 
etc.; we will not need this notion in this text), but it is still a function 
from R to R, because it still assigns a real number /(x) to each x € R. 

We can take any one of the previous functions / : R — >• R defined on 
all of R, and restrict the domain to a smaller set ICR, creating a new 
function, sometimes called f\x, from X to R. This is the same function 
as the original function /, but is only defined on a smaller domain. (Thus 
f\x(x) := f(x) when x € X, and f\x{x ) is undefined when x fL X.) 
For instance, we can restrict the function f(x) := x 2 , which is initially 
defined from R to R, to the interval [1,2], thus creating a new function 
f\[i, 2 ] '■ [I; 2] — >• R, which is defined as / | [ 1 , 2 ] 0*0 = when x € [1, 2] but 
is undefined elsewhere. 

One could also restrict the range from R to some smaller subset Y 
of R, provided of course that all the values of /(x) lie inside Y. For 
instance, the function / : R — >• R defined by /(x) := x 2 could also be 
thought of as a function from R to [0,oo), instead of a function from 
R to R. Formally, these two functions are different functions, but the 
distinction between them is so minor that we shall often be careless 
about the range of a function in our discussion. 

Strictly speaking, there is a distinction between a function /, and 
its value f(x) at a point x. f is a function; but /(x) is a number (which 
depends on some free variable x). This distinction is rather subtle and 
we will not stress it too much, but there are times when one has to 
distinguish between the two. For instance, if / : R —> R is the function 
f(x) := x 2 , and g := /|[i, 2 ] is the restriction of / to the interval [1,2], 
then / and g both perform the operation of squaring, i.e., f(x) = x 2 
and g(x) = x 2 , but the two functions / and g are not considered the 
same function, f g, because they have different domains. Despite this 
distinction, we shall often be careless, and say things like “consider the 
function x 2 + 2x + 3” when really we should be saying “consider the 
function / : R — >• R defined by f(x) := x 2 + 2x + 3”. (This distinction 
makes more of a difference when we start doing things like differentiation. 
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For instance, if / : R — * R is the function f(x) = x 2 , then of course 
/( 3) = 9, but the derivative of / at 3 is 6, whereas the derivative of 9 is 
of course 0, so we cannot simply “differentiate both sides” of /( 3) = 9 
and conclude that 6 = 0.) 

If X is a subset of R, and / : X — >• R is a function, we can form 
the graph {(x,f(x)) : x € X} of the function /; this is a subset of 
X x R, and hence a subset of the Euclidean plane R 2 = R x R. One 
can certainly study a function through its graph, by using the geometry 
of the plane R 2 (e.g., employing such concepts as tangent lines, area, 
and so forth). We however will pursue a more “analytic” approach, in 
which we rely instead on the properties of the real numbers to analyze 
these functions. The two approaches are complementary; the geometric 
approach offers more visual intuition, while the analytic approach offers 
rigour and precision. Both the geometric intuition and the analytic 
formalism become useful when extending analysis of functions of one 
variable to functions of many variables (or possibly even infinitely many 
variables) . 

Just as numbers can be manipulated arithmetically, so can functions: 
the sum of two functions is a function, the product of two functions is a 
function, and so forth. 

Definition 9.2.1 (Arithmetic operations on functions). Given two func- 
tions / : X — > R and g : X — > R, we can define their sum f + g : X — >• R 
by the formula 

(f + g)(x) := f(x)+g(x), 
their difference / — g : X — > R by the formula 

( f-g)(x ) := f(x)-g(x), 
their maximum max(/, g) : X — >• R by 

ma x(f,g)(x) := max.(f(x),g(x)), 
their minimum min (f,g) : X — >• R by 

min (f,g){x) ■= min (f(x),g(x)), 
their product fg : X — >• R (or / • g : X — >• R) by the formula 


■= f(x)g(x) 
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and (provided that g(x) / 0 for all x € X) the quotient f/g : X — > R 
by the formula 

(. f/ 9 )(x ) := f{x)/g{x). 

Finally, if c is a real number, we can define the function cf : X — >• R (or 
c • / : X — >• R) by the formula 

(c/)(x) := c X f(x). 

Example 9.2.2. If / : R — >• R is the function f(x) := x 2 , and g : 
R — >• R is the function g(x) := 2x, then / + g : R — > R is the function 
(/ + 9){ x ) := x 2 + 2 x, while /<? : R — >• R is the function fg(x) = 2x 3 . 
Similarly / — g : R — >• R is the function (/ — g)(x) := x 2 — 2x, while 
6/ : R -> R is the function (6 f)(x) = 6x 2 . Observe that fg is not 
the same function as fog , which maps x > Ax 2 , nor is it the same as 
go f, which maps x i — > 2x 2 (why?). Thus multiplication of functions and 
composition of functions are two different operations. 

— Exercises — 

Exercise 9.2.1. Let / :R— »R, </:R— »R, /i:R— »-R. Which of the following 
identities are true, and which ones are false? In the former case, give a proof; 
in the latter case, give a counterexample. 

(/ + g) ° h = (/ ° h) + (g o h ) 
fo(g + h) = (fog) + (fo h) 

(f + g)-h = (f-h) + (g- h) 
f-{g + h) = (f-g) + (f- h) 

9.3 Limiting values of functions 

In Chapter 6 we defined what it means for a sequence (a n )^T 0 to converge 
to a limit L. We now define a similar notion for what it means for a 
function / defined on the real line, or on some subset of the real line, 
to converge to some value at a point. Just as we used the notions of 
e-closeness and eventual e-closeness to deal with limits of sequences, we 
shall need a notion of e-closeness and local e-closeness to deal with limits 
of functions. 

Definition 9.3.1 (e-closeness). Let X be a subset of R, let / : X — >• R 
be a function, let L be a real number, and let e > 0 be a real number. 
We say that the function / is e-close to L iff f{x) is e-close to L for 
every x € X. 
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Example 9.3.2. When the function f(x) := x 2 is restricted to the 
interval [1,3], then it is 5-close to 4, since when x G [1,3] then 1 < 
f(x) < 9, and hence | f(x) — 4| < 5. When instead it is restricted to the 
smaller interval [1.9, 2.1], then it is 0.41-close to 4, since if x G [1.9, 2.1], 
then 3.61 < f(x) < 4.41, and hence | f(x) — 4| < 0.41. 

Definition 9.3.3 (Local e-closeness). Let X be a subset of R, let / : 
X — y R be a function, let L be a real number, xq be an adherent point 
of X , and e > 0 be a real number. We say that / is e-close to L near xo 
iff there exists a 5 > 0 such that / becomes e-close to L when restricted 
to the set {x G X : \x — xo\ < <5}. 

Example 9.3.4. Let / : [1,3] — > R be the function f(x) := x 2 , re- 
stricted to the interval [1,3]. This function is not 0.1-close to 4, since 
for instance /( 1) is not 0.1-close to 4. However, / is 0.1-close to 4 near 2, 
since when restricted to the set {x G [1, 3] : \x — 2| < 0.01}, the function 
/ is indeed 0.1-close to 4. This is because when \x — 2| < 0.01, we have 
1.99 < x < 2.01, and hence 3.9601 < f(x) < 4.0401, and in particular 
f(x) is 0.1-close to 4. 

Example 9.3.5. Continuing with the same function / used in the pre- 
vious example, we observe that / is not 0.1-close to 9, since for instance 
/( 1) is not 0.1-close to 9. However, / is 0.1-close to 9 near 3, since when 
restricted to the set {x € [1,3] : \x — 3| < 0.01} - which is the same as 
the half-open interval (2.99,3] (why?), the function / becomes 0.1-close 
to 9 (since if 2.99 < x < 3, then 8.9401 < /(x) < 9, and hence f(x) is 
0.1-close to 9). 

Definition 9.3.6 (Convergence of functions at a point). Let X be a 
subset of R, let / : X — > R be a function, let E be a subset of X, xo 
be an adherent point of E, and let L be a real number. We say that / 
converges to L at xq in E. and write lim x ->x 0 ;xeE f {x) = L, iff /, after 
restricting to E, is e-close to L near xq for every e > 0. If / does not 
converge to any number L at xq, we say that / diverges at xq, and leave 
lim x -+ xo - xeE f{x) undefined. 

In other words, we have lim x -tx 0 ;xeE f{x) = L iff for every e > 0, 
there exists a 6 > 0 such that | f(x) — L\ < e for all x G E such that 
\x — xo| < 5. (Why is this definition equivalent to the one given above?) 

Remark 9.3.7. In many cases we will omit the set E from the above 
notation (i.e. , we will just say that / converges to L at xq, or that 



222 


9. Continuous functions on R 


f(x) = L), although this is slightly dangerous. For instance, it 
sometimes makes a difference whether E actually contains to or not. 
To give an example, if / : R — >• R is the function defined by set- 
ting f(x) = 1 when x = 0 and f(x) = 0 when x 0, then one 
has lim x _ >0;xeR \{ 0 yf(x) = 0, but lim^o^eR f{x) is undefined. Some 
authors only define the limit lim x ->xo-xeE f i x ) when E does not con- 
tain .To (so that To is now a limit point of E rather than an adherent 
point), or would use lim x _ Kro;xe E fix) to denote what we would call 
li m zea;o;a;eE\{xo} f( x )> but we have chosen a slightly more general nota- 
tion, which allows the possibility that E contains To- 

Example 9.3.8. Let / : [1,3] — >• R be the function /( x) := t 2 . We 
have seen before that / is 0.1-close to 4 near 2. A similar argument 
shows that / is 0.01-close to 4 near 2 (one just has to pick a smaller 
value of 5). 

Definition 9.3.6 is rather unwieldy. However, we can rewrite this 
definition in terms of a more familiar one, involving limits of sequences. 

Proposition 9.3.9. Let X be a subset of R, let f : X — >• R be a 
function, let E be a subset of X, let xq be an adherent point of E, and 
let L be a real number. Then the following two statements are logically 
equivalent: 

(a) f converges to L at to in E. 

( b ) For every sequence (a n )£T 0 which consists entirely of elements of 
E and converges to To, the sequence (/(a n ))j£L 0 converges to L. 

Proof. See Exercise 9.3.1. □ 

In view of the above proposition, we will sometimes write “/( x) — >• L 
as t — y To in E" or “/ has a limit L at To in E" instead of “/ converges 
to L at t 0 ”, or “lim x ^ xo /( x) = L” . 

Remark 9.3.10. With the notation of Proposition 9.3.9, we have the 
following corollary: if \im x _> Xo . xeE f(x) = L , and lim, woo a n = t 0 , then 
li m n— Kx> f (fin) — h . 

Remark 9.3.11. We only consider limits of a function / at To in the 
case when To is an adherent point of E. When To is not an adherent 
point then it is not worth it to define the concept of a limit. (Can you 
see why there will be problems?) 
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Remark 9.3.12. The variable x used to denote a limit is a dummy 
variable; we could replace it by any other variable and obtain ex- 
actly the same limit. For instance, if lim x -> Xo - x ee f (%) = L, then 
lim y -y xo;ye E f(y) = L, and conversely (why?). 

Proposition 9.3.9 has some immediate corollaries. For instance, we 
now know that a function can have at most one limit at each point: 

Corollary 9.3.13. Let X be a subset of R, let E be a subset of X, let 
xq be an adherent point of E, and let f : X — >• R be a function. Then f 
can have at most one limit at xo in E. 

Proof. Suppose for sake of contradiction that there are two distinct num- 
bers L and L' such that / has a limit L at xo in E, and such that / also 
has a limit L' at xq in E. Since xo is an adherent point of E, we know by 
Lemma 9.1.14 that there is a sequence (a n )“ L 0 consisting of elements in 
E which converges to xq. Since / has a limit L at xq in E, we thus see 
by Proposition 9.3.9, that {f(a n ))ff =0 converges to L. But since / also 
has a limit L' at xq in E, we see that (f(a n ))ff =0 also converges to L' . 
But this contradicts the uniqueness of limits of sequences (Proposition 
6.1.7). □ 

Using the limit laws for sequences, one can now deduce the limit laws 
for functions: 

Proposition 9.3.14 (Limit laws for functions). Let X be a subset of 
R, let E be a subset of X , let xq be an adherent point of E, and let 
f : X — > R and g : X — >• R be functions. Suppose that f has a limit L 
at xo in E, and g has a limit M at xq in E. Then f + g has a limit 
L + M at xo in E, f — g has a limit L — M at xo in E, max(/, g) has 
a limit max(L, M) at xo in E, min(/, g) has a limit min(L, M ) at xo in 
E and fg has a limit LM at xo in E. If c is a real number, then cf has 
a limit cL at xo in E. Finally, if g is non-zero on E ( i.e ., g{x) / 0 for 
all x € E) and M is non-zero, then f /g has a limit L/M at xo in E. 

Proof. We just prove the first claim (that f + g has a limit L + M); 
the others are very similar and are left to Exercise 9.3.2. Since xq is an 
adherent point of E, we know by Lemma 9.1.14 that there is a sequence 
(a n )° T 0 consisting of elements in E, which converges to xo- Since / has 
a limit L at xo in E, we thus see by Proposition 9.3.9, that (f(a n ))ff =0 
converges to L. Similarly (g(a n ))ff =0 converges to M. By the limit 
laws for sequences (Theorem 6.1.19) we conclude that ((/ + g)(a n ))ff = 0 



224 


9. Continuous functions on R 


converges to L + M. By Proposition 9.3.9 again, this implies that f + g 
has a limit L + M at xo in E as desired (since (a n )^L 0 was an arbitrary 
sequence in E converging to xo). □ 

Remark 9.3.15. One can phrase Proposition 9.3.14 more informally as 
saying that 


lim (/ ±g)(x) = lim /(x) ± lim g(x) 

X^-XQ X^-Xo X—>Xo 


lim max(/, g){x) 

X^-XQ 


max I lim /(x), lim g(x) 

'.£— >•#() X—^XO 


lim min(/, g)(x) 

x^xo 

lim (. fg)(x ) 

X—>XO 

lim ( f/g){x ) 

X^-XQ 


min I lim /(x), lim g(x] 

\x^XQ X^XQ 


lim /(x) lim g(x) 

X^XQ X^XQ 

lim^^ /(x) 
lim^-^ g(x) 


(where we have dropped the restriction x € E for brevity) but bear 
in mind that these identities are only true when the right-hand side 
makes sense, and furthermore for the final identity we need g to be non- 
zero, and also lim-,,-^ g(x) to be non-zero. (See Example 1.2.4 for some 
examples of what goes wrong when limits are manipulated carelessly.) 


Using the limit laws in Proposition 9.3.14 we can already deduce 
several limits. First of all, it is easy to check the basic limits 


lim c = c 

x^xo m ,xEH 


and 

lim x = xo 

X^XO',X£l{, 

for any real numbers xo and c. (Why? use Proposition 9.3.9.) By the 
limit laws we can thus conclude that 


lim x 2 = Xq 

x^xo m ,xEH 


lim cx = cxo 
rr— 

lim x 2 + cx + d = Xq + cxo + d 

x—>xo;x£l{, 

etc., where c, d are arbitrary real numbers. 
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If / converges to L at xo in X, and Y is any subset of X such that 
xq is still an adherent point of Y. then / will also converge to L at xo 
in Y (why?). Thus convergence on a large set implies convergence on a 
smaller set. The converse, however, is not true: 


Example 9.3.16. Consider the signiLm function sgn : R — > R, defined 

by 

f 1 if x > 0 

sgn(x) := < 0 if x = 0 

\ — 1 if x < 0 

Then lim x _ ) . 0 . xe(0jOO ) sgn(x) = 1 (why?), whereas lim x _ >0;x6( _ OO)0) = -1 
(why?) and lim x _»o;a:eR sgn(x) is undefined (why?). Thus it is some- 
times dangerous to drop the set X from the notation of limit. How- 
ever, in many cases it is safe to do so; for instance, since we know that 
lim x _>. Xo;xeR x 2 = Xq, we know in fact that lim xexo;xe x x 2 = Xq for any 
set X with xo as an adherent point (why?). Thus it is safe to write 
lim x _*. xo x 2 = x 2 q. 

Example 9.3.17. Let f(x) be the function 


f(x) 


1 if x = 0 
0 if x / 0. 


Then lim x _ > . 0 . xeR _ {0 } f(x) = 0 (why?), but lim x _*. 0 ; xeR f(x) is undefined 
(why). (When this happens, we say that / has a “removable singularity” 
or “removable discontinuity” at 0. Because of such singularities, it is 
sometimes the convention when writing lim x ^ xo /(x) to automatically 
exclude xq from the set; for instance, in the textbook, lim x ^ Xo /(x) is 
used as shorthand for lim x ^ Xo;xeX _{ Xo} /(x).) 

On the other hand, the limit at xq should only depend on the values 
of the function near xq ; the values away from xo are not relevant. The 
following proposition reflects this intuition: 


Proposition 9.3.18 (Limits are local). Let X be a subset of R, let E 
be a subset of X, let xq be an adherent point of E, let f : X R be a 
function, and let L be a real number. Let 5 > 0. Then we have 


lim f(x) = L 

x^-xq ~,xEE 


if and only if 


lim f(x) = L. 

x^xo;x£En(xo—S,xo+8) 
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Proof. See Exercise 9.3.3. □ 

Informally, the above proposition asserts that 

lim /(x) = lim /(x). 

x^-xo ;x£E x^xo',xGEn(xo—S,xo-\-S) 

Thus the limit of a function at xo, if it exists, only depends on the values 
of / near xo; the values far away do not actually influence the limit. 

We now give a few more examples of limits. 

Example 9.3.19. Consider the functions / : R — > R and g : R — >• R 
defined by /(x) := x + 2 and g(x) := x + 1. Then lim x _ ) . 2 ; xe R /(x) = 4 
and lim T _^ 2 ; a;£R, g(x) = 3. We would like to use the limit laws to 
conclude that lim x _ > 2;a;eR f( x )/d( x ) = 4/3, or in other words that 
lim x _ ) .2;a;eR §4f = §• Strictly speaking, we cannot use Proposition 9.3.14 
to ensure this, because x + 1 is zero at x = —1, and so f(x)/g(x) is not 
defined. However, this is easily solved, by restricting the domain of / 
and g from R to a smaller domain, such as R — {1}. Then Proposition 
9.3.14 does apply, and we have lim x _ >2 ;a;€R-{i} f+i = §• 

Example 9.3.20. Consider the function / : R — {1} — >• R defined 
by /(x) := (x 2 — l)/(x — 1). This function is well-defined for every 
real number except 1, so /( 1) is undefined. However, 1 is still an ad- 
herent point of R — {1} (why?), and the limit lim :r _ 5 . 1;xeR _{ 1 } f{x) is 
still defined. This is because on the domain R — {1} we have the 
identity (x 2 — l)/(x — 1) = (x + l)(x — l)/(x — 1) = x + 1, and 
lim ®-»-l;xeR-{l} x + 1 = 2. 

Example 9.3.21. Let / : R — >• R be the function 

f(x)-=l 1 ifxGQ 
T{ ) ' \0 if x 0 Q. 

We will show that /(x) has no limit at 0 in R. Suppose for sake of 
contradiction that /(x) had some limit L at 0 in R. Then we would 
have lirrin^oo f(a n ) = L whenever (a n )^T 0 is a sequence of non-zero 
numbers converging to 0. Since (l/n)))T 0 is such a sequence, we would 
have 

L = lim /(1/n) = lim 1 = 1. 

n— >• oo n— >■ oo 

On the other hand, since (\/2/n)ff = Q is another sequence of non-zero 
numbers converging to 0 - but now these numbers are irrational instead 
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of rational - we have 

L = lim f{y/2/n) = lim 0 = 0. 

71— »■ OO 71— >■ OO 

Since 1 7^ 0, we have a contradiction. Thus this function does not have 
a limit at 0. 


— Exercises — 

Exercise 9.3.1. Prove Proposition 9.3.9. 

Exercise 9.3.2. Prove the remaining claims in Proposition 9.3.14. 

Exercise 9.3.3. Prove Lemma 9.3.18. 

Exercise 9.3.4. Propose a definition for limit superior limsup x _^, I . 0 . xe£ ; f(x) and 
limit inferior liminf x _ >a;o;a ; £ .E f(x), and then propose an analogue of Proposition 
9.3.9 for your definition. (For an additional challenge: prove that analogue.) 

Exercise 9.3.5. (Continuous version of squeeze test) Let X be a subset of R, 
let E be a subset of X, let xq be an adherent point of E, and let / : X — > R, 
g : X — > R, h : X — > R be functions such that f(x) < g(x) < h(x) for all 
x € E. If we have lim x _ >Xo;xe £ /(&) = lim a; _ > , Xo . xe £: h(x) = L for some real 
number L , show that lim x _>. Xo;xe £ g(x) = L. 


9.4 Continuous functions 


We now introduce one of the most fundamental notions in the theory of 
functions - that of continuity. 

Definition 9.4.1 (Continuity). Let X be a subset of R, and let / : 
X —)■ R be a function. Let xq be an element of X. We say that / is 
continuous at xq iff we have 


lim 


/(x) = /(x 0 ); 


in other words, the limit of /(x) as x converges to xo in X exists and is 
equal to f(x 0). We say that / is continuous on X (or simply continuous ) 
iff / is continuous at xo for every xo € X. We say that / is discontinuous 
at xq iff it is not continuous at xq. 


Example 9.4.2. Let c be a real number, and let / : R — >• R be the 
constant function /(x) := c. Then for every real number xo € R, we 
have 

lim /(x) = lim c = c = /(x 0), 

thus / is continuous at every point xo € R, or in other words / is 
continuous on R. 



228 


9. Continuous functions on R 


Example 9.4.3. Let / : R — > R be the identity function /(x) := x. 
Then for every real number xo £ R, we have 

lim /(x) = lim x = xq = f(x o), 

thus / is continuous at every point xo € R, or in other words / is 
continuous on R. 


Example 9.4.4. Let sgn : R — >• R be the signum function defined in 
Example 9.3.16. Then sgn(x) is continuous at every non-zero value of 
x; for instance, at 1, we have (using Proposition 9.3.18) 

lim sgn(x) = lim sgn(x) 
i->l;a:GR a;— >1 ;xG(0.9,1.1) 

= lim 1 
x— ►l;x€(0.9,l.l) 

= 1 

= sgn(l). 


On the other hand, sgn is not continuous at 0, since the limit 
lim^OixeR sgn(x) does not exist. 

Example 9.4.5. Let / : R — > R be the function 


XxeQ 

JW ' \0 if x 0 Q. 

Then by the discussion in the previous section, / is not continuous at 
0. In fact, it turns out that / is not continuous at any real number xo 
(can you see why?). 

Example 9.4.6. Let / : R — >• R be the function 


/(*) 


1 if x > 0 
0 if x < 0. 


Then / is continuous at every non-zero real number (why?), but is not 
continuous at 0. However, if we restrict / to the right-hand line [0,oo), 
then the resulting function /|[o )0 o) now becomes continuous everywhere 
in its domain, including 0. Thus restricting the domain of a function 
can make a discontinuous function continuous again. 
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There are several ways to phrase the statement that “/ is continuous 
at xo”: 

Proposition 9.4.7 (Equivalent formulations of continuity). Let X be a 
subset of R, let f : X — >• R be a function, and let xo be an element of 
X. Then the following four statements are logically equivalent: 

(a) f is continuous at xo- 

( b ) For every sequence (a n )((L 0 consisting of elements of X with 
linirwoo a n = x 0 , we have lim , woo /(a n ) = /(x 0 ). 

(c) For every e > 0, there exists a 5 > 0 such that |/(x) — /(x o)| < £ 
for all x € X with \x — xo| < 5. 

( d ) For every e > 0, there exists a 5 > 0 such that |/(x) — /(x o)| < £ 
for all x & X with \x — xo| < 5 . 

Proof. See Exercise 9.4.1. □ 

Remark 9.4.8. A particularly useful consequence of Proposition 9.4.7 
is the following: if / is continuous at xo, and a n — > xo as n — >• oo, then 
f{a n ) — >• /(x o) as n — >• oo (provided that all the elements of the sequence 
(a n )ff =0 lie in the domain of /, of course). Thus continuous functions 
are very useful in computing limits. 

The limit laws in Proposition 9.3.14, combined with the definition of 
continuity in Definition 9.4.1, immediately imply 

Proposition 9.4.9 (Arithmetic preserves continuity). Let X be a subset 
of R, and let f : X R and g : X — >• R be functions. Let xq 6 I. 
Then if f and g are both continuous at xo, then the functions f + g, 
f — g, max(/, g), min (f,g) and fg are also continuous at xq. If g is 
non-zero on X , then f / g is also continuous at xq. 

In particular, the sum, difference, maximum, minimum, and product 
of continuous functions are continuous; and the quotient of two continu- 
ous functions is continuous as long as the denominator does not become 
zero. 

One can use Proposition 9.4.9 to show that a lot of functions are 
continuous. For instance, just by starting from the fact that con- 
stant functions are continuous, and the identity function /(x) = x is 
continuous (Exercise 9.4.2), one can show that the function g{x) : = 
max(x 3 + 4x 2 + x + 5,x 4 — x 3 )/(x 2 — 4), for instance, is continuous at 
every point of R except the two points x = +2, x = —2 where the 
denominator vanishes. 
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Some other examples of continuous functions are given below. 

Proposition 9.4.10 (Exponentiation is continuous, I). Let a > 0 be a 
positive real number. Then the function f : R — >• R defined by f(x) := a x 
is continuous. 

Proof. See Exercise 9.4.3. □ 

Proposition 9.4.11 (Exponentiation is continuous, II). Let p be a real 
number. Then the function f : (0, oo) — >• R defined by f(x) := x p is 
continuous. 

Proof. See Exercise 9.4.4. □ 

There is a stronger statement than Propositions 9.4.10, 9.4.11, 
namely that exponentiation is jointly continuous in both the exponent 
and the base, but this is harder to show; see Exercise 11.25.10. 

Proposition 9.4.12 (Absolute value is continuous). The function f : 
R — >• R defined by f(x) := |x| is continuous. 

Proof. This follows since |x| = max(x, — x) and the functions x,—x are 
already continuous. □ 

The class of continuous functions is not only closed under addition, 
subtraction, multiplication, and division, but is also closed under com- 
position: 

Proposition 9.4.13 (Composition preserves continuity). Let X andY 
be subsets of R, and let f : X — > Y and g : Y — > R be functions. Let xq 
be a point in X. If f is continuous at xq, and g is continuous at f{x o), 
then the composition g o f : X — > R is continuous at xq ■ 

Proof. See Exercise 9.4.5. □ 

Example 9.4.14. Since the function f(x) := 3x + l is continuous on all 
of R, and the function g(x) := 5 X is continuous on all of R, the function 
go f(x) = 5 3,r+1 is continuous on all of R. By several applications of the 
above propositions, one can show that far more complicated functions, 
e.g., h(x) := \x 2 — 8x + 7|^/ (x 2 + 1), are also continuous. (Why is this 
function continuous?) There are still a few functions though that are 
not yet easy to test for continuity, such as k{x) := x x ; this function can 
be dealt with more easily once we have the machinery of logarithms, 
which we will see in Section 11.25. 
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— Exercises — 

Exercise 9.4.1. Prove Proposition 9.4.7. (Hint: this can largely be done by 
applying the previous propositions and lemmas. Note that to prove (a),(b), 
and (c) are equivalent, you do not have to prove all six equivalences, but you 
do have to prove at least three; for instance, showing that (a) implies (b), (b) 
implies (c), and (c) implies (a) will suffice, although this is not necessarily the 
shortest or simplest way to do this question.) 

Exercise 9.4.2. Let X be a subset of R, and let c £ R. Show that the constant 
function / : X — > R defined by f(x) := c is continuous, and show that the 
identity function g : X — > R defined by g(x) := x is also continuous. 

Exercise 9.4.3. Prove Proposition 9.4.10. (Hint: you can use Lemma 6.5.3, 
combined with the squeeze test (Corollary 6.4.14) and Proposition 6.7.3.) 

Exercise 9.4.4. Prove Proposition 9.4.11. (Hint: from limit laws (Proposition 
9.3.14) one can show that linr^i x n = 1 for all integers n. From this and the 
squeeze test (Corollary 6.4.14) deduce that lina^i x p = 1 for all real numbers 
p. Finally, apply Proposition 6.7.3.) 

Exercise 9.4.5. Prove Proposition 9.4.13. 

Exercise 9.4.6. Let X be a subset of R, and let / : X — >• R be a continuous 
function. If Y is a subset of X, show that the restriction f\y :Y — > R of / to 
Y is also a continuous function. (Hint: this is a simple result, but it requires 
you to follow the definitions carefully.) 

Exercise 9.4.7. Let n > 0 be an integer, and for each 0 < i < n let c* be a real 
number. Let P : R — > R be the function 

n 

P{x) := y^CjX*; 

i=0 

such a function is known as a polynomial of one variable ; a typical example is 
P{x) = 6x 4 — 3a’ 2 + 4. Show that P is continuous. 

9.5 Left and right limits 

We now introduce the notion of left and right limits, which can 
be thought of as two seperate “halves” of the complete limit 
\im. x —}x 0 ;x£X fix)' 

Definition 9.5.1 (Left and right limits). Let X be a subset of R, / : 
X — >• R be a function, and let x$ be a real number. If ao is an adherent 
point of Xfl (xo,oo), then we define the right limit f(x o+) of / at xo 
by the formula 

/(*o+) := lim f(x), 

x^-xo;xEXr\(xo,oo) 
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provided of course that this limit exists. Similarly, if xo is an adherent 
point of If! (— 00 , xo), then we define the left limit /(x 0— ) of / at xo 
by the formula 

f (xo ) := lim /(x), 

x—t xo ; x G X D ( — 00 ,xo ) 

again provided that the limit exists. (Thus in many cases /(xo+) and 
f(x 0— ) will not be defined.) 

Sometimes we use the shorthand notations 

lim /(x) := lim /(x); 

x^xq-\- x—>xo‘,xEXr\(xo,oo) 

lim f(x) : = lim /(x) 

x^xq— x^-xo;xeXr\(—oo,xo) 

when the domain X of / is clear from context. 

Example 9.5.2. Consider the signurn function sgn : R — >• R defined in 
Example 9.3.16. We have 

sgn(0+) = lim sgn(x) = lim 1 = 1 

x— »a;o;a;6Rn(0,oo) i->ioi*£Rn(0,oo) 

and 


sgn(0— ) = lim sgn(x) = lim —1 = — 1, 

i->io;i£Rn(-oo,o) x— >£o;£GRn(— 00, 0) 

while sgn(O) = 0 by definition. 

Note that / does not necessarily have to be defined at xo in order 
for /(xo+) or /(x 0 — ) to be defined. For instance, if / : R — {0} — >• R 
is the function f(x) := x/|x|, then /( 0+) = 1 and /( 0— ) = —1 (why?), 
even though /( 0) is undefined. 

From Proposition 9.4.7 we see that if the right limit /(xo+) exists, 
and (a n )^L 0 is a sequence in X converging to xo from the right (i.e. , 
a n > xo for all n € N), then lim^oo f(a n ) = f(x o+). Similarly, if 
(b n )™ =0 is a sequence converging to xo from the left (i.e., a n < xq for all 
n G N) then lim, woo f(a n ) = /(x 0 -). 

Let xo be an adherent point of both X n (xo, 00 ) and X n (— 00 , xo). 
If / is continuous at xq, it is clear from Proposition 9.4.7 that /(x 0 +) 
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and f(x o— ) both exist and are equal to f(x o). (Can you see why?) A 
converse is also true (compare this with Proposition 6.4.12(f)): 

Proposition 9.5.3. Let X be a subset of R containing a real number 
xo, and suppose that xq is an adherent point of both X n (xo,oo) and 
X n (— oo,xo). Let. f : X — >• R be a function. If f(x o+) and f(x o— ) 
both exist and are both equal to f(x o), then f is continuous at xq. 

Proof. Let us write L := f{x o). Then by hypothesis we have 

iinr f(x) = L (9.1) 

x^xq ; x G A D [xo , oo ) 

and 

lirn f(x) = L. (9.2) 

x^xq ; x G X fl ( — oo ,xo ) 

Let s > 0 be given. From (9.1) and Proposition 9.4.7 (applied to the 
restriction of / to X n (xo,+oo)), we know that there exists a 5+ > 0 
such that | /(x) — L \ < £ for all xglfl (xo, oo) for which \x — xo| < <5+. 
From (9.2) we similarly know that there exists a 5- > 0 such that 
| f(x) — L\ < e for all x € X n (— oo,xo) for which \x — xq\ < 5-. Now 
let 6 := min(5_, <5+); then 6 > 0 (why?), and suppose that x G X is 
such that \x — xo\ < 5. Then there are three cases: x > xq, x = xq, and 

x < xo, but in all three cases we know that |/(x) — L\ < e. (Why? the 

reason is different in each of the three cases.) By Proposition 9.4.7 we 
thus have that / is continuous at xo, as desired. □ 

As we saw with the signum function in Example 9.3.16, it is possible 
for the left and right limits /(x o— ), /(x o+) of a function / at a point xo 
to both exist, but not be equal to each other; when this happens, we say 
that / has a jump discontinuity at xo- Thus, for instance, the signum 
function has a jump discontinuity at zero. Also, it is possible for the left 
and right limits /(x o~ ), fix o+) to exist and be equal each other, but 
not be equal to /(x o); when this happens we say that / has a removable 
discontinuity (or removable singularity ) at xo- For instance, if we take 
/: R^R to be the function 


fix) 


1 if x = 0 
0 if x / 0, 


then /( 0+) and /( 0— ) both exist and equal 0 (why?), but /( 0) equals 
1; thus / has a removable discontinuity at 0. 
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Remark 9.5.4. Jump discontinuities and removable discontinuities are 
not the only way a function can be discontinuous. Another way is for a 
function to go to infinity at the discontinuity: for instance, the function 
/ : R~{0} — t R defined by f(x) := 1/x has a discontinuity at 0 which is 
neither a jump discontinuity or a removable singularity; informally, f(x) 
converges to +oo when x approaches 0 from the right, and converges to 
— oo when x approaches 0 from the left. These types of singularities are 
sometimes known as asymptotic discontinuities. There are also oscilla- 
tory discontinuities, where the function remains bounded but still does 
not have a limit near xq. For instance, the function / : R — >• R defined 
by 

f(x)-=i 1 if XeQ 

J[) ' \0 if x<fQ 

has an oscillatory discontinuity at 0 (and in fact at any other real number 
also). This is because the function does not have left or right limits at 
0, despite the fact that the function is bounded. 

The study of discontinuities (also called singularities ) continues fur- 
ther, but is beyond the scope of this text. For instance, singularities 
play a key role in complex analysis. 

— Exercises — 

Exercise 9.5.1. Let A be a subset of R, let / : E — > R be a function, and let x 0 
be an adherent point of E. Write down a definition of what it would mean for 
the limit lim x ^x 0 -,xeE f{x) to exist and equal +oo or — oo. If / : R\{0} —> R 
is the function f(x) := 1/x, use your definition to conclude /(0+) = +00 and 
/( 0— ) = — 00 . Also, state and prove some analogue of Proposition 9.3.9 when 
L = +00 or L = — 00 . 


9.6 The maximum principle 

In the previous two sections we saw that a large number of functions 
were continuous, though certainly not all functions were continuous. 
We now show that continuous functions enjoy a number of other useful 
properties, especially if their domain is a closed interval. It is here 
that we shall begin exploiting the full power of the Heine-Borel theorem 
(Theorem 9.1.24). 

Definition 9.6.1. Let A be a subset of R, and let / : X — >• R be a 
function. We say that / is bounded from above if there exists a real 
number M such that f(x) < M for all x G X. We say that / is bounded 
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from below if there exists a real number M such that f(x) > —M for all 
x € X. We say that / is bounded if there exists a real number M such 
that \f{x)\ < M for all x € X. 

Remark 9.6.2. A function is bounded if and only if it is bounded both 
from above and below. (Why? Note that one part of the “if and only 
if” is slightly trickier than the other.) Also, a function / : X — >• R is 
bounded if and only if its image f(X ) is a bounded set in the sense of 
Definition 9.1.22 (why?). 

Not all continuous functions are bounded. For instance, the func- 
tion f(x) := x on the domain R is continuous but unbounded (why?), 
although it is bounded on some smaller domains, such as [1,2]. The 
function f(x) := \/x is continuous but unbounded on (0,1) (why?), 
though it is continuous and bounded on [1,2] (why?). However, if the 
domain of the continuous function is a closed and bounded interval, then 
we do have boundedness: 

Lemma 9.6.3. Let a < b be real numbers, and let f : [a, b] — >• R be a 
function continuous on [a, b\ . Then f is a bounded function. 

Proof. Suppose for sake of contradiction that / is not bounded. Thus 
for every real number M there exists an element x € [a, b] such that 
1/0*01 > M. 

In particular, for every natural number n, the set {x € [a, b] : 
|/0*0| — n l is non-empty. We can thus choose 2 a sequence (x n )ff =0 
in [a,b\ such that |/(x n )| > n for all n. This sequence lies in [a, b], 
and so by Theorem 9.1.24 there exists a subsequence (x nj )T 1 0 which 
converges to some limit L € [a, b], where no < n\ < ri2 < ■ ■ ■ is an in- 
creasing sequence of natural numbers. In particular, we see that rij > j 
for all j G N (why? use induction). 

Since / is continuous on [a, b ], it is continuous at L , and in particular 
we see that 

lirn f(x n 0 = f{L). 

J ^oo 

Thus the sequence (f(x nj ))°fL 0 is convergent, and hence it is bounded. 
On the other hand, we know from the construction that |/(x' nj )| > 


2 Strictly speaking, this requires the axiom of choice, as in Lemma 8.4.5. However, 
one can also proceed without the axiom of choice, by defining x„ := supfa: £ [a, b] : 
\f(x)\ > n}, and using the continuity of / to show that \f(x„)\ > n. We leave the 
details to the reader. 
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rij > j for all j, and hence the sequence (f (x nj ))j °. 0 is not bounded, a 
contradiction. □ 

Remark 9.6.4. There are two things about this proof that are worth 
noting. Firstly, it shows how useful the Heine-Borel theorem (Theo- 
rem 9.1.24) is. Secondly, it is an indirect proof; it doesn’t say how to 
find the bound for /, but it shows that having / unbounded leads to a 
contradiction. 

We now improve Lemma 9.6.3 to say something more. 

Definition 9.6.5 (Maxima and minima). Let / : X -> R be a function, 
and let xo € X. We say that / attains its maximum at xo if we have 
f(x o) > f(x) for all x € X (i.e., the value of / at the point xo is larger 
than or equal to the value of / at any other point in X). We say that / 
attains its minimum at xq if we have /(x o) < /(x). 

Remark 9.6.6. If a function attains its maximum somewhere, then it 
must be bounded from above (why?). Similarly if it attains its minimum 
somewhere, then it must be bounded from below. These notions of max- 
ima and minima are global', local versions will be defined in Definition 
10.2.1. 

Proposition 9.6.7 (Maximum principle). Let a < b be real numbers, 
and let f : [a, b\ — > R be a function continuous on [a, b\. Then f attains 
its maximum at some point x m ax £ [a, b], and also attains its minimum 
at some point x m j n € [a, 6]. 

Remark 9.6.8. Strictly speaking, “maximum principle” is a misnomer, 
since the principle also concerns the minimum. Perhaps a more precise 
name would have been “extremum principle”; the word “extremum” is 
used to denote either a maximum or a minimum. 

Proof. We shall just show that / attains its maximum somewhere; the 
proof that it attains its minimum also is similar but is left to the reader. 

From Lemma 9.6.3 we know that / is bounded, thus there exists an 
M such that — M < /(x) < M for each x € [a, b\. Now let E denote the 
set 

E := {/(x) : x G [a, 6]}. 

(In other words, E := /([a, &]).) By what we just said, this set is a 
subset of [— M, M], It is also non-empty, since it contains for instance 
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the point /(a). Hence by the least upper bound principle, it has a 
supremum sup(-E') which is a real number. 

Write m := sup (E). By definition of supremum, we know that y < m 
for all y € E\ by definition of E , this means that f(x) < m for all 
x € [a,b\. Thus to show that / attains its maximum somewhere, it will 
suffice to find an x ma x £ [a, b] such that f(x rnax ) = m. (Why will this 
suffice?) 

Let n > 1 be any integer. Then m — ^ < m = sup(£'). As sup(£l) is 

the least upper bound for E, m — ^ cannot be an upper bound for E, 

n 1 

thus there exists a y & E such that m — - < y. By definition of E. this 
implies that there exists an x € [a, b] such that m — ^ < /(x). 

We now choose a sequence (i n )^ =1 by choosing, for each n, x n to be 
an element of [a,b\ such that m — ^ < f(x n ). (Again, this requires the 
axiom of choice; however it is possible to prove this principle without 
the axiom of choice. For instance, you will see a better proof of this 
proposition using the notion of compactness in Proposition 11.10.2.) 
This is a sequence in [a, b ] ; by the Heine-Borel theorem (Theorem 9.1.24), 
we can thus find a subsequence (x rij where n± < ri 2 < ■ ■ -, which 
converges to some limit x m a X € [a, b\. Since (x n . )JL\ converges to x max , 
and / is continuous at x max , we have as before that 

lim /(x n ) = f{x max ). 

j~HX> 

On the other hand, by construction we know that 

, 1 1 
n.j j 

and so by taking limits of both sides we see that 

f(x max ) = lim f{x n .) > lim m - - = m. 

j—> OO j^-OO J 

On the other hand, we know that f(x) < m for all x € [a, b], so in 
particular f(x max ) < m. Combining these two inequalities we see that 
f(x max ) = m as desired. □ 

Note that the maximum principle does not prevent a function from 
attaining its maximum or minimum at more than one point. For in- 
stance, the function f(x) := x 2 on the interval [—2,2] attains its maxi- 
mum at two different points, at —2 and at 2. 
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Let us write sup xe r a & i f(x) as short-hand for sup{/(x) : x € [a, b}}, 
and similarly define inf xe r a M f(x). The maximum principle thus asserts 
that m := sup xe r ab i fix) is a real number and is the maximum value 
of / on [a, b], i.e., there is at least one point x m ax in [a, b] for which 
f(x ma x) = m -, and for every other x € [a, 6], f(x) is less than or equal 
to m. Similarly inf [ a ,61 f( x ) is the minimum value of / on [a,b]. 

We now know that on a closed interval, every continuous function is 
bounded and attains its maximum at least once and minimum at least 
once. The same is not true for open or infinite intervals; see Exercise 
9.6.1. 

Remark 9.6.9. You may encounter a rather different “maximum prin- 
ciple” in complex analysis or partial differential equations, involving 
analytic functions and harmonic functions respectively, instead of con- 
tinuous functions. Those maximum principles are not directly related 
to this one (though they are also concerned with whether maxima exist, 
and where the maxima are located). 

— Exercises — 

Exercise 9.6.1. Give examples of 

(a) a function / : (1,2) — > R which is continuous and bounded, attains its 
minimum somewhere, but does not attain its maximum anywhere; 

(b) a function / : [0, oo) — > R which is continuous, bounded, attains its 
maximum somewhere, but does not attain its minimum anywhere; 

(c) a function / : [—1,1] — > R which is bounded but does not attain its 
minimum anywhere or its maximum anywhere. 

(d) a function / : [—1,1] — > R which has no upper bound and no lower 
bound. 

Explain why none of the examples you construct violate the maximum principle. 
(Note: read the assumptions carefully'.) 

9.7 The intermediate value theorem 

We have just shown that a continuous function attains both its maximum 
value and its minimum value. We now show that / also attains every 
value in between. To do this, we first prove a very intuitive theorem: 

Theorem 9.7.1 (Intermediate value theorem). Let a < b, and let f : 
[a, 6] -> R be a continuous function on [a, b] . Let y be a real number 
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between f(a) and f(b), i.e., either f(a) < y < f(b) or f(a) > y > f(b). 
Then there exists c € [a, b] such that /(c) = y. 

Proof. We have two cases: f(a) < y < f(b) or /(a) > y > f(b). We will 
assume the former, that f(a) < y < /(&); the latter is proven similarly 
and is left to the reader. 

If y = f(a) or y = f(b) then the claim is easy, as one can simply set 
c = a or c = b, so we will assume that /(a) < y < f(b). Let E denote 
the set 

E ■■= {x <E [a, b] : f(x) < y}. 

Clearly if is a subset of [a, b\, and is hence bounded. Also, since /(a) < y, 
we see that a is an element of if, so if is non-empty. By the least upper 
bound principle, the supremum 

c := sup(if) 

is thus finite. Since E is bounded by 6, we know that c < 6; since E 
contains a, we know that c > a. Thus we have c € [a,b\. To complete 
the proof we now show that /(c) = y. The idea is to work from the left 
of c to show that /(c) < y, and to work from the right of c to show that 

/(c) > y. 

Let n > 1 be an integer. The number c — ^ is less than c = sup(if) 
and hence cannot be an upper bound for E. Thus there exists a point, 
call it x n , which lies in E and which is greater than c — Also x n < c 
since c is an upper bound for if. Thus 

1 

c <x n <c. 

n 

By the squeeze test (Corollary 6.4.14) we thus have lim n _ 5 . 0O = c. 
Since / is continuous at c, this implies that linx )WOO /(xn) = /(c). But 
since x n lies in E for every n, we have f(x n ) < y for every n. By 
the comparison principle (Lemma 6.4.13) we thus have /(c) < y. Since 
f(b) > /(c), we conclude cf=b. 

Since c / b and c € [a,b\, we must have c < b. In particular there 
is an N > 0 such that c + ^ < b for all n > N (since c + ^ converges 
to c as n — >• oo). Since c is the supremum of E and c + ^ > c, we 
thus have c + ^ 0 if for all n > N . Since c + ^ € [a, b], we thus have 
/(c+i) > y for all n> N . But c+ ] n converges to c, and / is continuous 
at c, thus /(c) > y. But we already knew that /(c) < y, thus /(c) = y, 
as desired. □ 
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The intermediate value theorem says that if / takes the values /(a) 
and /(&), then it must also take all the values in between. Note that if 
/ is not assumed to be continuous, then the intermediate value theorem 
no longer applies. For instance, if / : [—1, 1] — >• R is the function 


/ 0*0 


— 1 if x < 0 
1 if x > 0 


then /(— 1) = — 1, and /( 1) = 1, but there is no c € [—1, 1] for which 
/(c) = 0. Thus if a function is discontinuous, it can “jump” past inter- 
mediate values; however continuous functions cannot do so. 

Remark 9.7.2. A continuous function may take an intermediate value 
multiple times. For instance, if / : [—2,2] — > R is the function f{x ) : = 
x 3 — x, then /(— 2) = —6 and /( 2) = 6, so we know that there exists 
ac£ [—2,2] for which /(c) = 0. In fact, in this case there exists three 
such values of c: we have /(— 1) = /( 0) = /( 1) = 0. 

Remark 9.7.3. The intermediate value theorem gives another way to 
show that one can take n th roots of a number. For instance, to construct 
the square root of 2, consider the function / : [0, 2] — >• R defined by 
fix) = x 2 . This function is continuous, with /( 0) = 0 and /( 2) = 4. 
Thus there exists a c € [0,2] such that /(c) = 2, i.e., c 2 = 2. (This 
argument does not show that there is just one square root of 2, but it 
does prove that there is at least one square root of 2.) 

Corollary 9.7.4 (Images of continuous functions). Let a < b, and let 
/: M— ► R be a continuous function on [ a, 6 ] . Let M := sup^^y fix) 
be the maximum value of f , and let m := inf xg [ a b ] f(x) be the minimum 
value. Let y be a real number between m and M (i.e., m < y < M ). 
Then there exists a c € [a, 61 such that f(c) = y. Furthermore, we have 
/([a, 6]) = [ m,M] . 

Proof. See Exercise 9.7.1. □ 


— Exercises — 

Exercise 9.7.1. Prove Corollary 9.7.4. (Hint: you may need Exercise 9.4.6 in 
addition to the intermediate value theorem.) 

Exercise 9.7.2. Let / : [0, 1] —> [0, 1] be a continuous function. Show that 
there exists a real number x in [0, 1] such that fix) = x. (Hint: apply the 
intermediate value theorem to the function f{x) — x.) This point x is known 
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as a fixed point of /, and this result is a basic example of a fixed point theorem , 
which play an important role in certain types of analysis. 


9.8 Monotonic functions 


We now discuss a class of functions which is distinct from the class of 
continuous functions, but has somewhat similar properties: the class of 
monotone (or monotonic) functions. 

Definition 9.8.1 (Monotonic functions). Let X be a subset of R, and 
let / : X — > R be a function. We say that / is monotone increasing iff 
f(y) > f(x) whenever x, y € X and y > x. We say that / is strictly 
monotone increasing iff f(y) > f{x) whenever x, y € X and y > x. 
Similarly, we say / is monotone decreasing iff f(y) < f(x) whenever 
x, y G X and y > x, and strictly monotone decreasing iff f(y) < f(x) 
whenever x, y € X and y > x. We say that / is monotone if it is 
monotone increasing or monotone decreasing, and strictly monotone if 
it is strictly monotone increasing or strictly monotone decreasing. 

Examples 9.8.2. The function f(x) := x 2 , when restricted to the do- 
main [0,oo), is strictly monotone increasing (why?), but when restricted 
instead to the domain (— oo,0], is strictly monotone decreasing (why?). 
Thus the function is strictly monotone on both (— oo,0] and [0, oo), but 
is not strictly monotone (or monotone) on the full real line (— 00 , 00 ). 
Note that if a function is strictly monotone on a domain X , it is au- 
tomatically monotone as well on the same domain X. The constant 
function f(x) := 6, when restricted to an arbitrary domain X C R, is 
both monotone increasing and monotone decreasing, but is not strictly 
monotone (unless X consists of at most one point - why?). 

Continuous functions are not necessarily monotone (consider for in- 
stance the function f(x) = x 2 on R), and monotone functions are not 
necessarily continuous; for instance, consider the function / : [—1,1] — >• 
R defined earlier by 


/(*) 


— 1 if x < 0 
1 if x > 0. 


Monotone functions obey the maximum principle (Exercise 9.8.1), but 
not the intermediate value principle (Exercise 9.8.2). On the other hand, 
it is possible for a monotone function to have many, many discontinuities 
(Exercise 9.8.5). 



242 


9. Continuous functions on R 


If a function is both strictly monotone and continuous, then it has 
many nice properties. In particular, it is invertible: 

Proposition 9.8.3. Let a < b be real numbers, and let f : [a, b] — >• R 
be a function which is both continuous and strictly monotone increasing. 
Then f is a bijection from [a,b\ to [/(a), /(&)], and the inverse f : 
[/(a),/(6)] — >• [a, b\ is also continuous and strictly monotone increasing. 

Proof. See Exercise 9.8.4. □ 

There is a similar Proposition for functions which are strictly mono- 
tone decreasing; see Exercise 9.8.4. 

Example 9.8.4. Let n be a positive integer and R > 0. Since the 
function f(x) := x n is strictly increasing on the interval [0, /?] , we see 
from Proposition 9.8.3 that this function is a bijection from [0, R\ to 
[0, R n ] , and hence there is an inverse from [0, R n ] to [0, R] . This can 
be used to give an alternate means to construct the n th root a? 1 /” of a 
number x € [0 ,R] than what was done in Lemma 5.6.5. 

— Exercises — 

Exercise 9.8.1. Explain why the maximum principle remains true if the hy- 
pothesis that / is continuous is replaced with / being monotone, or with / 
being strictly monotone. (You can use the same explanation for both cases.) 

Exercise 9.8.2. Give an example to show that the intermediate value theo- 
rem becomes false if the hypothesis that / is continuous is replaced with / 
being monotone, or with / being strictly monotone. (You can use the same 
counterexample for both cases.) 

Exercise 9.8.3. Let a < b be real numbers, and let / : [a, b] — > R be a function 
which is both continuous and one-to-one. Show that / is strictly monotone. 
(Hint: divide into the three cases /(a) < f(b), f{a) = f(b), f(a) > f(b). The 
second case leads directly to a contradiction. In the first case, use contradic- 
tion and the intermediate value theorem to show that / is strictly monotone 
increasing; in the third case, argue similarly to show / is strictly monotone 
decreasing.) 

Exercise 9.8.4. Prove Proposition 9.8.3. (Hint: to show that / -1 is continu- 
ous, it is easiest to use the “epsilon-delta” definition of continuity, Proposition 
9.4.7(c).) Is the proposition still true if the continuity assumption is dropped, 
or if strict monotonicity is replaced just by monotonicity? How should one 
modify the proposition to deal with strictly monotone decreasing functions 
instead of strictly monotone increasing functions? 

Exercise 9.8.5. In this exercise we give an example of a function which has a 
discontinuity at every rational point, but is continuous at every irrational. Since 
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the rationals are countable, we can write them as Q = {<?(0), <7(1), <?(2), . . .}, 
where q : N — » Q is a bijection from N to Q. Now define a function g : Q — > R 
by setting g(q{n)) := 2~ n for each natural number n; thus g maps q( 0) to 1, g(l) 
to 2 _1 , etc. Since YlnL 0 is absolutely convergent, we see that ^frEQS(r) 

is also absolutely convergent. Now define the function / : R — > R by 

fix) := Sir)- 

r£Q:r<tc 

Since si r ) absolutely convergent, we know that f(x) is well-defined 

for every real number x. 

(a) Show that / is strictly monotone increasing. (Hint: you will need Propo- 
sition 5.4.14.) 

(b) Show that for every rational number r, / is discontinuous at r. (Hint: 
since r is rational, r = q(n) for some natural number n. Show that 
fix) > fir) + 2~ n for all x > r.) 

(c) Show that for every irrational number x, f is continuous at x. (Hint: 
first demonstrate that the functions 

fn Or) := Sir) 

r€.Q:r<x,g(r)>2~ n 

are continuous at x, and that \f(x) — f„{x)\ < 2~ n .) 

9.9 Uniform continuity 

We know that a continuous function on a closed interval [a, 6] remains 
bounded (and in fact attains its maximum and minimum, by the max- 
imum principle). However, if we replace the closed interval by an open 
interval, then continuous functions need not be bounded any more. An 
example is the function / : (0,2) — >• R defined by f{x) := 1/x. This 
function is continuous at every point in (0,2), and is hence continuous 
at (0,2), but is not bounded. Informally speaking, the problem here is 
that while the function is indeed continuous at every point in the open 
interval (0,2), it becomes “less and less” continuous as one approaches 
the endpoint 0. 

Let us analyze this phenomenon further, using the “epsilon-delta” 
definition of continuity - Proposition 9.4.7(c). We know that if / : X — > 
R is continuous at a point xo, then for every e > 0 there exists a S such 
that fix) will be e-close to fix 0 ) whenever x € X is d-close to xq. In 
other words, we can force f{x) to be e-close to fix 0 ) if we ensure that x 
is sufficiently close to xq. One way of thinking about this is that around 
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every point xq there is an “island of stability” (xo — 5, xo + 5), where the 
function /(x) doesn’t stray by more than e from f{x o). 

Example 9.9.1. Take the function f(x) := 1/x mentioned above at 
the point xo = 1. In order to ensure that /(x) is 0.1-close to f(x o), it 
suffices to take x to be 1/11-close to xo, since if x is 1/11-close to xo 
then 10/11 < x < 12/11, and so 11/12 < /(x) < 11/10, and so /(x) 
is 0.1-close to /(x o). Thus the “5” one needs to make /(x) 0.1-close to 
/(x o) is about 1/11 or so, at the point xo = 1. 

Now let us look instead at the point xq = 0.1. The function f(x) = 
1/x is still continuous here, but we shall see the continuity is much 
worse. In order to ensure that /(x) is 0.1-close to /(x o), we need x to 
be 1/1010-close to xo- Indeed, if x is 1/1010 close to xo, then 10/101 < 
x < 102/1010, and so 9.901 < /(x) < 10.1, so /(x) is 0.1-close to /(x o). 
Thus one needs a much smaller “5” for the same value of e - i.e., /(x) is 
much more “unstable” near 0.1 than it is near 1, in the sense that there 
is a much smaller “island of stability” around 0.1 as there is around 1 
(if one is interested in keeping /(x) 0.1-stable). 

On the other hand, there are other continuous functions which do 
not exhibit this behavior. Consider the function g : (0, 2) — >• R defined 
by g(x) := 2x. Let us fix e = 0.1 as before, and investigate the island 
of stability around xq = 1. It is clear that if x is 0.05-close to xo, then 
g(x) is 0.1-close to g{x o); in this case we can take <5 to be 0.05 at xo = 1. 
And if we move xo around, say if we set xo to 0.1 instead, the <5 does not 
change - even when xo is set to 0.1 instead of 1, we see that g{x) will 
stay 0.1-close to g(xo) whenever x is 0.05-close to xo- Indeed, the same 
5 works for every xo- When this happens, we say that the function g is 
uniformly continuous. More precisely: 

Definition 9.9.2 (Uniform continuity). Let A be a subset of R, and 
let / : X — >■ R be a function. We say that / is uniformly continuous if, 
for every e > 0, there exists a <5 > 0 such that /(x) and /(x o) are e-close 
whenever x, xo € X are two points in X which are 5-close. 

Remark 9.9.3. This definition should be compared with the notion 
of continuity. From Proposition 9.4.7(c), we know that a function / is 
continuous if for every e > 0, and every xo £ X, there is a 5 > 0 such 
that /(x) and f(x o) are e-close whenever x € X is 5-close to xq. The 
difference between uniform continuity and continuity is that in uniform 
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continuity one can take a single 5 which works for all xq € X; for 
ordinary continuity, each xo € X might use a different 5. Thus every 
uniformly continuous function is continuous, but not conversely. 

Example 9.9.4. (Informal) The function / : (0, 2) — >• R defined by 
f(x) := l/x is continuous on (0,2), but not uniformly continuous, be- 
cause the continuity (or more precisely, the dependence of 6 on e) be- 
comes worse and worse as x — >• 0. (We will make this more precise in 
Example 9.9.10.) 

Recall that the notions of adherent point and of continuous func- 
tion had several equivalent formulations; both had “epsilon-delta” type 
formulations (involving the notion of e-closeness) , and both had “sequen- 
tial” formulations (involving the convergence of sequences); see Lemma 
9.1.14 and Proposition 9.3.9. The concept of uniform continuity can sim- 
ilarly be phrased in a sequential formulation, this time using the concept 
of equivalent sequences (cf. Definition 5.2.6, but we now generalize to 
sequences of real numbers instead of rationals, and no longer require the 
sequences to be Cauchy): 

Definition 9.9.5 (Equivalent sequences). Let m be an integer, let 
( a «,)nLm an d (b n ) < ff =m be two sequences of real numbers, and let e > 0 be 
given. We say that (a n )^ =m is e-close to (b n )if= m iff is £-close to b n 
for each n > m. We say that {a n )ff =m is eventually e-close to (b n )^ =m 
iff there exists an N > m such that the sequences (a n )^_ N an d (^n)ri=N 
are e-close. Two sequences (a n )^ =m and (b n )^ =m are equivalent iff for 
each e > 0, the sequences (a n )$£L m and (b n )^f =m are eventually e-close. 

Remark 9.9.6. One could debate whether e should be assumed to be 
rational or real, but a minor modification of Proposition 6.1.4 shows 
that this does not make any difference to the above definitions. 

The notion of equivalence can be phrased more succinctly using our 
language of limits: 

Lemma 9.9.7. Let (a n )^ =1 and (b n )™ =1 be sequences of real numbers 
( not necessarily bounded or convergent) . Then (a n )^L 1 and (b n )^ =1 are 
equivalent if and only if lim n _ > , 00 (a n — b n ) = 0. 

Proof. See Exercise 9.9.1. □ 
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Meanwhile, the notion of uniform continuity can be phrased using 
equivalent sequences: 

Proposition 9.9.8. Let X be a subset of R, and let f : X — > R be a 
function. Then the following two statements are logically equivalent: 

(a) f is uniformly continuous on X . 

( b ) Whenever (x n )J)T 0 and (y n )™ = o are t wo equivalent sequences con- 
sisting of elements of X, the sequences (/(x n ))J£L 0 and {f(y n ))^L o 
are also equivalent. 

Proof. See Exercise 9.9.2. □ 

Remark 9.9.9. The reader should compare this with Proposition 9.3.9. 
Proposition 9.3.9 asserted that if / was continuous, then / maps con- 
vergent sequences to convergent sequences. In contrast, Proposition 
9.9.8 asserts that if / is uniformly continuous, then / maps equivalent 
pairs of sequences to equivalent pairs of sequences. To see how the two 
Propositions are connected, observe from Lemma 9.9.7 that (x n )^T 0 
will converge to x* if and only if the sequences (x n )^L 0 and (x*)^L 0 are 
equivalent. 

Example 9.9.10. Consider the function / : (0, 2) — >• R defined by 
/(x) := 1/x considered earlier. From Lemma 9.9.7 we see that the se- 
quence (l/?r)^L 1 and (1/2 n)™ =1 are equivalent sequences in (0,2). How- 
ever, the sequences {f(l/n))ff =l and (/( l/2n))fff =l are not equivalent 
(why? Use Lemma 9.9.7 again). So by Proposition 9.9.8, / is not uni- 
formly continuous. (These sequences start at 1 instead of 0, but the 
reader can easily see that this makes no difference to the above discus- 
sion.) 

Example 9.9.11. Consider the function / : R — >• R defined by 
/(x) := x 2 . This is a continuous function on R, but it turns out not 
to be uniformly continuous; in some sense the continuity gets “worse 
and worse” as one approaches infinity. One way to quantify this is 
via Proposition 9.9.8. Consider the sequences {ri)f? =1 and (n + 

By Lemma 9.9.7, these sequences are equivalent. But the sequences 
(f(n))^ =1 and (/(n + ^))^Li are not equivalent, since f(n + y) = 
n 2 + 2 + ^2 = /(n) + 2 H — £ does not become eventually 2-close to 
/(n). By Proposition 9.9.8 we can thus conclude that / is not uniformly 
continuous. 
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Another property of uniformly continuous functions is that they map 
Cauchy sequences to Cauchy sequences. 

Proposition 9.9.12. Let X be a subset of R, and let f : X —*■ R 
be a uniformly continuous function. Let (x„)“ 0 be a Cauchy sequence 
consisting entirely of elements in X. Then (f (x n ))ff =0 is also a Cauchy 
sequence. 

Proof. See Exercise 9.9.3. □ 

Example 9.9.13. Once again, we demonstrate that the function / : 
(0,2) — R defined by f(x) := 1/x is not uniformly continuous. The 
sequence (1 /ri)ff =1 is a Cauchy sequence in (0,2), but the sequence 
(/(l/n))^ =1 is not a Cauchy sequence (why?). Thus by Proposition 
9.9.12, / is not uniformly continuous. 

Corollary 9.9.14. Let X be a subset of R, let f : X — >• R be a uniformly 
continuous function, and let xo be an adherent point of X . Then the limit 
lim x ->x 0 -,xex f(x) exists (in particular, it is a real number). 

Proof. See Exercise 9.9.4. □ 

We now show that a uniformly continuous function will map bounded 
sets to bounded sets. 

Proposition 9.9.15. Let X be a subset of R, and let f : X — »• R be a 
uniformly continuous function. Suppose that E is a bounded subset of 
X . Then f(E) is also bounded. 

Proof. See Exercise 9.9.5. □ 

As we have just seen repeatedly, not all continuous functions are 
uniformly continuous. However, if the domain of the function is a closed 
interval, then continuous functions are in fact uniformly continuous: 

Theorem 9.9.16. Let a < b be real numbers, and let f : [a, b\ —> R 
be a function which is continuous on [a, b\ . Then f is also uniformly 
continuous. 

Proof. Suppose for sake of contradiction that / is not uniformly contin- 
uous. By Proposition 9.9.8, there must therefore exist two equivalent se- 
quences (x n )^L 0 and (y n )~ 0 in [a, b] such that the sequences (f(x n ))^ =0 
and (f(yn))ff=o are not equivalent. In particular, we can find an e > 0 
such that (f(x n ))^ =0 and (f(yn))^=o are n °t eventually e-close. 
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Fix this value of e, and let E be the set 

E := {n € N : f(x n ) and f(y n ) are not e-close}. 

We must have FI infinite, since if FI were finite then (f(x n ))ff =0 and 
(/ (?/n))n=o woll ld be eventually e-close (why?). By Proposition 8.1.5, FI 
is countable; in fact from the proof of that proposition we see that we 
can find an infinite sequence 


no < n\ < n 2 < ■ ■ ■ 


consisting entirely of elements in E. In particular, we have 

I f{x nj ) ~ f(y nj ) I > £ for all j € N. (9.3) 

On the other hand, the sequence (x nj )}L 0 is a sequence in [a, b] , and so by 
the Heine-Borel theorem (Theorem 9.1.24) there must be a subsequence 
(x n . )fcL 0 which converges to some limit L in [a, 6]. In particular, / is 
continuous at L , and so by Proposition 9.4.7, 

lim f(x n ) = f(L). (9.4) 

fc— >■ oo 

Note that (x njfc )^T 0 is a subsequence of (x n )^T 0 , and (y n . ; )^L 0 is a 
subsequence of (y n )^L q, by Lemma 6.6.4. On the other hand, from 
Lemma 9.9.7 we have 

lim (x n - y n ) = 0. 

n— ?>oo 

By Proposition 6.6.5, we thus have 


Since converges to L as k — > oo, we thus have by limit laws 

iim y n . = L 

k — ^oo 


and hence by continuity of / at L 



f(L). 


Subtracting this from (9.4) using limit laws, we obtain 


lim (fix 

k — foo 


L 3k 


) ~ f(Vn ik )) = 0. 


But this contradicts (9.3) (why?). From this contradiction we conclude 
that / is in fact uniformly continuous. □ 
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Remark 9.9.17. One should compare Lemma 9.6.3, Proposition 9.9.15, 
and Theorem 9.9.16 with each other. No two of these results imply the 
third, but they are all consistent with each other. 

— Exercises — 

Exercise 9.9.1. Prove Lemma 9.9.7. 

Exercise 9.9.2. Prove Proposition 9.9.8. (Hint: you should avoid Lemma 9.9.7, 
and instead go back to the definition of equivalent sequences in Definition 9.9.5.) 
Exercise 9.9.3. Prove Proposition 9.9.12. (Hint: use Definition 9.9.2 directly.) 

Exercise 9.9.4. Use Proposition 9.9.12 to prove Corollary 9.9.14. Use this 
corollary to give an alternate demonstration of the results in Example 9.9.10. 

Exercise 9.9.5. Prove Proposition 9.9.15. (Hint: mimic the proof of Lemma 
9.6.3. At some point you will need either Proposition 9.9.12 or Corollary 
9.9.14.) 

Exercise 9.9.6. Let X,Y,Z be subsets of R. Let / : X — > Y be a function 
which is uniformly continuous on X , and let g : Y — > Z be a function which is 
uniformly continuous on Y . Show that the function go f : X — > Z is uniformly 
continuous on X. 

9.10 Limits at infinity 

Until now, we have discussed what it means for a function / : X — >• R 
to have a limit as x — >• xq as long as xq is a real number. We now briefly 
discuss what it would mean to take limits when xo is equal to +oo or 
— oo. (This is part of a more general theory of continuous functions on 
a topological space; see Section 11.12.) 

First, we need a notion of what it means for Too or — oo to be 
adherent to a set. 

Definition 9.10.1 (Infinite adherent points). Let X be a subset of R. 
We say that +oo is adherent to X iff for every M € R there exists an 
x € X such that x > M; we say that — oo is adherent to X iff for every 
M € R there exists an x € X such that x < M . 

In other words, +oo is adherent to X iff X has no upper bound, or 
equivalently iff sup(X) = +oo. Similarly — oo is adherent to X iff X has 
no lower bound, or iff inf (X) = — oo. Thus a set is bounded if and only 
if Too and — oo are not adherent points. 

Remark 9.10.2. This definition may seem rather different from Def- 
inition 9.1.8, but can be unified using the topological structure of the 
extended real line R*, which we will not discuss here. 
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Definition 9.10.3 (Limits at infinity). Let X be a subset of R with 
+oo as an adherent point, and let / : X — >• R be a function. We say that 
f(x) converges to L as x —> +oo in X, and write lim x ^. +00;a; exf(x) = L, 
iff for every e > 0 there exists an M such that / is e-close to L on 
X n (M, +oo) (i.e., | f(x) — L\ < e for all x € X such that x > M). 
Similarly we say that f(x) converges to Lass-} — oo iff for every e > 0 
there exists an M such that / is e-close to L on X n (— oo, M). 

Example 9.10.4. Let / : (0, oo) -» R be the function f{x) := 1/x. 
Then we have lim a .^. +oo;a . e (o,oo) l/ x = 0- (Can you see why, from the 
definition? ) 

One can do many of the same things with these limits at infinity as 
we have been doing with limits at other points xq; for instance, it turns 
out that all of the limit laws continue to hold. However, as we will not be 
using these limits much in this text, we will not devote much attention 
to these matters. We will note though that this definition is consistent 
with the notion of a limit fiiip^oo a n of a sequence (Exercise 9.10.1). 

— Exercises — 

Exercise 9.10.1. Let (a„)(£L 0 be a sequence of real numbers, then a n can also 
be thought of as a function from N to R, which takes each natural number n 
to a real number a n . Show that 

lim a n = lim a n 

n— >-+oo;n£N n— )■ oo 

where the left-hand limit is defined by Definition 9.10.3 and the right-hand 
limit is defined by Definition 6.1.8. More precisely, show that if one of the 
above two limits exists then so does the other, and then they both have the 
same value. Thus the two notions of limit here are compatible. 
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Differentiation of functions 


10.1 Basic definitions 


We can now begin the rigorous treatment of calculus in earnest, starting 
with the notion of a derivative. We can now define derivatives analyti- 
cally, using limits, in contrast to the geometric definition of derivatives, 
which uses tangents. The advantage of working analytically is that (a) 
we do not need to know the axioms of geometry, and (b) these definitions 
can be modified to handle functions of several variables, or functions 
whose values are vectors instead of scalar. Furthermore, one’s geometric 
intuition becomes difficult to rely on once one has more than three di- 
mensions in play. (Conversely, one can use one’s experience in analytic 
rigour to extend one’s geometric intuition to such abstract settings; as 
mentioned earlier, the two viewpoints complement rather than oppose 
each other.) 


Definition 10.1.1 (Differentiability at a point). Let X be a subset of 
R, and let xo £ X be an element of X which is also a limit point of X. 
Let / : X — >• R be a function. If the limit 


lim 

x^xq;xGX—{xo} 


f(x) - /(so) 

X — Xq 


converges to some real number L, then we say that / is differentiable at 
xq on X with derivative L , and write f'(x o) := L. If the limit does not 
exist, or if xo is not an element of X or not a limit point of X, we leave 
f'{x o) undefined, and say that / is not differentiable at xq on X. 


Remark 10.1.2. Note that we need xq to be a limit point in order for 
xq to be adherent to X — {xq}, otherwise the limit 


lim 

x— > xq;xGX— {so} 


/(x) - fix o) 
X — Xo 
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would automatically be undefined. In particular, we do not define the 
derivative of a function at an isolated point; for instance, if one restricts 
the function / : R — >• R defined by f(x) := x 2 to the domain X := 
[1, 2] U {3} , then the restriction of the function ceases to be differentiable 
at 3. (See however Exercise 10.1.1 below.) In practice, the domain X 
will almost always be an interval, and so by Lemma 9.1.21 all elements 
xq of X will automatically be limit points and we will not have to care 
much about these issues. 

Example 10.1.3. Let / : R — >• R be the function f(x) := x 2 , and let 
xo be any real number. To see whether / is differentiable at xo on R, 
we compute the limit 


lim 

x— »2o;a:eR— {®o} 


f(x) - /(x 0 ) 

X — Xo 


™2 _ 2 

lim X 

x^xo]xEH— {# o } % — ^0 


We can factor the numerator as (x 2 — Xq) = (x — xq){x + xo). Since 
x € R — {xo}, we may legitimately cancel the factors of x — xo and write 
the above limit as 

lim x + xo 

which by limit laws is equal to 2xo- Thus the function f(x) is differen- 
tiable at xo and its derivative there is 2xo- 

Remark 10.1.4. This point is trivial, but it is worth mentioning: if 
/ : X -> R is differentiable at xo, and g : X — >• R is equal to / 
(i.e., g(x) = /(x) for all x € X), then g is also differentiable at xo 
and g'(x o) = f'{x o) (why?). However, if two functions / and g merely 
have the same value at xo, i.e., g{x o) = /(x o), this does not imply that 
g'{x o) = /'(xo). (Can you see a counterexample?) Thus there is a big 
difference between two functions being equal on their whole domain, and 
merely being equal at one point. 

Remark 10.1.5. One sometimes writes ^ instead of /'. This notation 
is of course very familiar and convenient, but one has to be a little 
careful, because it is only safe to use as long as x is the only variable 
used to represent the input for /; otherwise one can get into all sorts of 
trouble. For instance, the function / : R — >• R defined by /(x) := x 2 
has derivative ^ = 2x, but the function g : R — >• R defined by g(y) : = 
y 2 would seem to have derivative = 0 if y and x are independent 
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variables, despite the fact that g and / are exactly the same function. 
Because of this possible source of confusion, we will refrain from using 
the notation j- whenever it could possibly lead to confusion. (This 
confusion becomes even worse in the calculus of several variables, and 
the standard notation of can lead to some serious ambiguities. There 
are ways to resolve these ambiguities, most notably by introducing the 
notion of differentiation along vector fields, but this is beyond the scope 
of this text.) 


Example 10.1.6. Let / : R — > R be the function /(x) := |x|, and let 
xo = 0. To see whether / is differentiable at 0 on R, we compute the 
limit 


lim 

i-»0;a;6R-{0} 


/(x) - /( 0) 
x — 0 


lim i-1. 

i->0;ieR~{0} X 


Now we take left limits and right limits. The right limit is 


\x\ X 

lim — = lim — = lim 1 = 1, 

>-0;:rE(0,oo) X >-0;ieE(0,oo) X >-0;:rE(0,oo) 


while the left limit is 


lim J-l 

:r— >-0;fcE(— oo,0) X 


— X 

lim = lim —1 = — 1, 

>-0;;rE(0,oo) X >-0;:rE(0,oo) 


and these limits do not match. Thus lim x _ 5 . 0;3 . gR _{ 0 } does not exist, 
and / is not differentiable at 0 on R. However, if one restricts / to [0, oo), 
then the restricted function /|[o,oo) differentiable at 0 on [0,oo), with 
derivative 1: 


lim 

£— »0;a;e[0,oo)— {0} 


/(x) - /( 0) 
x — 0 


lim M = L 

>-0;:rE(0,oo) X 


Similarly, when one restricts / to (— oo,0], the restricted function 
/ 1 (—oo,0] is differentiable at 0 on (— oo,0], with derivative —1. Thus 
even when a function is not differentiable, it is sometimes possible to 
restore the differentiability by restricting the domain of the function. 

If a function is differentiable at xo, then it is approximately linear 
near xo: 

Proposition 10.1.7 (Newton’s approximation). Let X be a subset of 
R, let xq € X be a limit point of X , let f : X — >• R be a function, 



254 


10. Differentiation of functions 


and let L be a real number. Then the following statements are logically 
equivalent: 

(a) f is differentiable at xq on X with derivative L. 

( b ) For every £ > 0, there exists a 5 > 0 such that f(x) is e\x — to|- 
close to /(.To) + L( x — To) whenever x € X is 5-close to xq, i.e., 
we have 

I f(x) - (/(t 0 ) + L( x - t 0 ))| < e\x - t 0 | 
whenever x € X and \x — To| < 5. 

Remark 10.1.8. Newton’s approximation is of course named after the 
great scientist and mathematician Isaac Newton (1642-1727), one of the 
founders of differential and integral calculus. 

Proof. See Exercise 10.1.2. □ 

Remark 10.1.9. We can phrase Proposition 10.1.7 in a more informal 
way: if / is differentiable at To, then one has the approximation /( x) ~ 
/(to) + f'{ to)(t — to), and conversely. 

As the example of the function / : R — >• R defined by /( x) := |t| 
shows, a function can be continuous at a point without being differen- 
tiable at that point. However, the converse is true: 

Proposition 10.1.10 (Differentiability implies continuity). Let X be a 
subset of R, let To € X be a limit point of X, and let f : X — >• R be a 
function. If f is differentiable at To, then f is also continuous at To- 

Proof. See Exercise 10.1.3. □ 

Definition 10.1.11 (Differentiability on a domain). Let A be a subset 
of R, and let / : X — >• R be a function. We say that / is differentiable 
on X if, for every limit point To € A, the function / is differentiable at 
To on A. 

From Proposition 10.1.10 and the above definition we have an im- 
mediate corollary: 

Corollary 10.1.12. Let X be a subset of R, and let f : X — > R be a 
function which is differentiable on X. Then f is also continuous on X. 
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Now we state the basic properties of derivatives which you are all 
familiar with. 

Theorem 10.1.13 (Differential calculus). Let X be a subset of R, let 
xq € X be a limit point of X, and let f : X — > R and g : X — > R be 
functions. 

(a) If f is a constant function, i.e., there exists a real number c such 
that f{x) = c for all x € X, then f is differentiable at xq and 
/'(.t 0 ) = 0. 

( b ) If f is the identity function, i.e., /( x) = x for all x € X , then f 
is differentiable at To and /'( To) = 1. 

(c) ( Sum rule ) If f and g are differentiable at to, then f + g is also 
differentiable at To, and (/ + g)'{ To) = /'( To) + g'( to). 

( d ) ( Product rule) If f and g are differentiable at to, then fg is also 
differentiable at t 0 , and (fg)'(x 0 ) = f'(x 0 )g(x 0 ) + f(x 0 )g'(x 0 ). 

(e) If f is differentiable at To and c is a real number, then cf is also 
differentiable at To, and (c/)'( To) = cf(xo). 

(/) ( Difference rule ) If f and g are differentiable at xq, then f — g is 
also differentiable at To, and (/ — g)'{ To) = /'( To) — g'( x o)- 

( g ) If g is differentiable at xq, and g is non-zero on X {i.e., g{ x) fi 0 
for all x € X), then 1/g is also differentiable at to, and (^)'(to) = 

3'Qeq) 

sOufP' 

(h) (Quotient rule) If f and g are differentiable at To, and g is non- 
zero on X , then f/g is also differentiable at to, and 

(fur \ f(xo)g(xo) - /(toV(t 0 ) 

9 g{xoY 

Remark 10.1.14. The product rule is also known as the Leibniz rule , 
after Gottfried Leibniz (1646-1716), who was the other founder of dif- 
ferential and integral calculus besides Newton. 


Proof. See Exercise 10.1.4. 


□ 
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As you are well aware, the above rules allow one to compute many 
derivatives easily. For instance, if / : R — {1} — >• Ris the function 
f(x ) := , then it is easy to use the above rules to show that f(x o) = 

(^ 0^)2 for all xo € R — {1}. (Why? Note that every point xq in R — {1} 
is a limit point of R — {1}.) 

Another fundamental property of differentiable functions is the fol- 
lowing: 

Theorem 10.1.15 (Chain rule). Let X, Y be subsets of R, let xo € X 
be a limit point of X , and let yo € Y be a limit point ofY . Let f : X — >• Y 
be a function such that f(x o) = yo, and such that f is differentiable at 
xq. Suppose that g : Y — >• R is a function which is differentiable at yo. 
Then the function g o / : X — >• R is differentiable at xq, and 

(g°f)'(x o) = g , (yo)f , (x 0 ). 

Proof. See Exercise 10.1.7. □ 

Example 10.1.16. If / : R — {1} — > R is the function f{x) := §Xf, 
and g : R — >• R is the function g(y) := y 2 , then g o f(x) = (y5y ) 2 j and 
the chain rule gives 

(9o/) ' (l » ) = 2 (^T)h^F' 

Remark 10.1.17. If one writes y for f(x), and 2 for g(y), then the 
chain rule can be written in the more visually appealing manner = 
c jff J < fa.. However, this notation can be misleading (for instance it blurs 
the distinction between dependent variable and independent variable, 
especially for y). and leads one to believe that the quantities dz, dy, dx 
can be manipulated like real numbers. However, these quantities are 
not real numbers (in fact, we have not assigned any meaning to them 
at all), and treating them as such can lead to problems in the future. 
For instance, if / depends on x\ and x' 2 , which depend on t, then chain 
rule for several variables asserts that ^ , but this rule 

might seem suspect if one treated df , dt, etc. as real numbers. It is 
possible to think of dy, dx, etc. as “infinitesimal real numbers” if one 
knows what one is doing, but for those just starting out in analysis, I 
would not recommend this approach, especially if one wishes to work 
rigorously. (There is a way to make all of this rigorous, even for the 
calculus of several variables, but it requires the notion of a tangent 
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vector, and the derivative map, both of which are beyond the scope of 
this text.) 


— Exercises — 

Exercise 10.1.1. Suppose that X is a subset of R, Xo is a limit point of X, and 
/ : X — > R is a function which is differentiable at Xo . Let FcXbe such that 
xo 6 Y, and Xo is also a limit point of Y. Prove that the restricted function 
f\r ■ Y — > R is also differentiable at Xo, and has the same derivative as / at 
xq ■ Explain why this does not contradict the discussion in Remark 10.1.2. 

Exercise 10.1.2. Prove Proposition 10.1.7. (Hint: the cases x = Xo and x ^ Xq 
have to be treated separately.) 

Exercise 10.1.3. Prove Proposition 10.1.10. (Hint: either use the limit laws 
(Proposition 9.3.14), or use Proposition 10.1.7.) 

Exercise 10.1.4. Prove Theorem 10.1.13. (Hint: use the limit laws in Propo- 
sition 9.3.14. Use earlier parts of this theorem to prove the latter. For the 
product rule, use the identity 

f( x )g(x) - f{x 0 )g{ x 0 ) 

= f(x)g(x) - f(x)g(x 0 ) + f(x)g(x 0 ) - f(x 0 )g(x 0 ) 

= f{x)(g{x) - g{x 0 )) + (/( x) - f(x 0 ))g{x 0 ); 

this trick of adding and subtracting an intermediate term is sometimes known 
as the “middle-man trick” and is very useful in analysis.) 

Exercise 10.1.5. Let n be a natural number, and let / : R — > R be the function 
f(x) := x n . Show that / is differentiable on R and f(x) = nx n ~ 1 for all 
x € R. (Hint: use Theorem 10.1.13 and induction.) 

Exercise 10.1.6. Let n be a negative integer, and let / : R — {0} — > R be the 
function f(x) x n . Show that / is differentiable on R and f'{x) = nx n ~ 1 for 
all x € R — {0}. (Hint: use Theorem 10.1.13 and Exercise 10.1.5.) 

Exercise 10.1.7. Prove Theorem 10.1.15. (Hint: one way to do this is via New- 
ton’s approximation, Proposition 10.1.7. Another way is to use Proposition 
9.3.9 and Proposition 10.1.10 to convert this problem into one involving lim- 
its of sequences, however with the latter strategy one has to treat the case 
f'{x o) = 0 separately, as some division- by-zero subtleties can occur in that 
case.) 

10.2 Local maxima, local minima, and derivatives 

As you learnt in your basic calculus courses, one very common appli- 
cation of using derivatives is to locate maxima and minima. We now 
present this material again, but this time in a rigorous manner. 
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The notion of a function / : X — > R attaining a maximum or mini- 
mum at a point xo € X was defined in Definition 9.6.5. We now localize 
this definition: 

Definition 10.2.1 (Local maxima and minima). Let / : X — >• R be 
a function, and let xo £ X. We say that / attains a local maximum 
at xo iff there exists a 5 > 0 such that the restriction f\xn(x 0 -S,x 0 +S) 
of / to X n (xo — 6, xo + 5) attains a maximum at xo- We say that 
/ attains a local minimum at xo iff there exists a 5 > 0 such that the 
restriction f\xn(x 0 -S,xo+S) of / to X fi (xo — 5, x o + e>) attains a minimum 
at xq* 


Remark 10.2.2. If / attains a maximum at xo, we sometimes say that 
/ attains a global maximum at xo, in order to distinguish it from the 
local maxima defined here. Note that if / attains a global maximum 
at xo, then it certainly also attains a local maximum at this xo, and 
similarly for minima. 

Example 10.2.3. Let / : R — >• R denote the function /(x) := x 2 — x 4 . 
This function does not attain a global minimum at 0, since for example 
/( 2) = — 12 < 0 = /( 0), however it does attain a local minimum, for 
if we choose 5 := 1 and restrict / to the interval (—1, 1), then for all 
x € (—1, 1) we have x 4 < x 2 and thus /(x) = x 2 — x 4 > 0 = /( 0), and 
so /h_i i) has a local minimum at 0. 

Example 10.2.4. Let / : Z — > R be the function /(x) = x, defined on 
the integers only. Then / has no global maximum or global minimum 
(why?), but attains both a local maximum and local minimum at every 
integer n (why?). 

Remark 10.2.5. If / : X — >• R attains a local maximum at a point xq 
in X, and Y c X is a subset of X which contains xo, then the restriction 
fW : Y — >• R also attains a local maximum at xo (why?). Similarly for 
minima. 

The connection between local maxima, minima and derivatives is the 
following. 

Proposition 10.2.6 (Local extrema are stationary). Let a < b be real 
numbers, and let f : (a, b) —>■ R be a function. If xq € ( a,b ), / is differ- 
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entiable at xq, and f attains either a local maximum or local minimum 
at xq, then f'{x o) = 0. 

Proof. See Exercise 10.2.1. □ 

Note that / must be differentiable for this proposition to work; see 
Exercise 10.2.2. Also, this proposition does not work if the open interval 
(a, b) is replaced by a closed interval [a, b] . For instance, the function 
/ : [1, 2] — >• R defined by f(x) := x has a local maximum at xq = 2 and a 
local minimum xo = 1 (in fact, these local extrema are global extrema), 
but at both points the derivative is f'(x o) = 1, not f'(x o) = 0. Thus 
the endpoints of an interval can be local maxima or minima even if the 
derivative is not zero there. Finally, the converse of this proposition is 
false (Exercise 10.2.3). 

By combining Proposition 10.2.6 with the maximum principle, one 
can obtain 

Theorem 10.2.7 (Rolle’s theorem). Let a < b be real numbers, and let 
g : [a, b\ — > R be a continuous function which is differentiable on ( a,b ). 
Suppose also that g(a) = g(b). Then there exists an x € (a, b) such that 
g'{x) = o. 

Proof. See Exercise 10.2.4. □ 

Remark 10.2.8. Note that we only assume / is differentiable on the 
open interval (a, b ), though of course the theorem also holds if we assume 
/ is differentiable on the closed interval [a, 6], since this is larger than 

( a,b )• 

Rolle’s theorem has an important corollary. 

Corollary 10.2.9 (Mean value theorem). Let a < b be real numbers, 

and let f : [a, b\ — > R be a function which is continuous on [a, b\ and 

differentiable on (a, b). Then there exists an x € (a, b ) such that f'(x) = 

/ 0 )-/( a ) 

b—a 

Proof. See Exercise 10.2.5. □ 


— Exercises — 

Exercise 10.2.1. Prove Proposition 10.2.6. 

Exercise 10.2.2. Give an example of a function f : (—1, 1) — > R which is 
continuous and attains a global maximum at 0, but which is not differentiable 
at 0. Explain why this does not contradict Proposition 10.2.6. 
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Exercise 10.2.3. Give an example of a function / : (—1, 1) — > R which is 
differentiable, and whose derivative equals 0 at 0, but such that 0 is neither 
a local minimum nor a local maximum. Explain why this does not contradict 
Proposition 10.2.6. 

Exercise 10.2.4. Prove Theorem 10.2.7. (Hint: use Corollary 10.1.12 and the 
maximum principle, Proposition 9.6.7, followed by Proposition 10.2.6. Note 
that the maximum principle does not tell you whether the maximum or mini- 
mum is in the open interval (a, b) or is one of the boundary points a, b, so you 
have to divide into cases and use the hypothesis g{a) = g{b) somehow.) 

Exercise 10.2.5. Use Theorem 10.2.7 to prove Corollary 10.2.9. (Hint: consider 
a function of the form f(x) — cx for some carefully chosen real number c.) 

Exercise 10.2.6. Let M > 0, and let / : [a, b] — >• R be a function which is 
continuous on [a, b] and differentiable on (a, 6), and such that \f'(x)\ < M for 
all x € (a, b) (i.e. , the derivative of / is bounded). Show that for any x, y € [a, b] 
we have the inequality | f(x) — f(y)\ < M\x — y\. (Hint: apply the mean value 
theorem (Corollary 10.2.9) to a suitable restriction of /.) Functions which 
obey the bound \f(x) — f(y) \ < M\x — y\ are known as Lipschitz continuous 
functions with Lipschitz constant M; thus this exercise shows that functions 
with bounded derivative are Lipschitz continuous. 

Exercise 10.2.7. Let / : R — >- R be a differentiable function such that f 
is bounded. Show that / is uniformly continuous. (Hint: use the preceding 
exercise.) 

10.3 Monotone functions and derivatives 

In your elementary calculus courses, you may have come across the as- 
sertion that a positive derivative meant an increasing function, and a 
negative derivative meant a decreasing function. This statement is not 
completely accurate, but it is pretty close; we now give the precise ver- 
sion of these statements below. 

Proposition 10.3.1. Let X be a subset of R, let x$ € X be a limit point 
of X, and let f : X R be a function. If f is monotone increasing and 
/ is differentiable at xq, then f(x o) >0. If f is monotone decreasing 
and / is differentiable at xq, then f(x q) < 0. 

Proof. See Exercise 10.3.1. □ 

Remark 10.3.2. We have to assume that / is differentiable at xq] 
There exist monotone functions which are not always differentiable (see 
Exercise 10.3.2), and of course if / is not differentiable at xq we cannot 
possibly conclude that f(x o) > 0 or f(x o) < 0. 
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One might naively guess that if / were strictly monotone increas- 
ing, and / was differentiable at xo, then the derivative f'(x o) would be 
strictly positive instead of merely non-negative. Unfortunately, this is 
not always the case (Exercise 10.3.3). 

On the other hand, we do have a converse result: if function has 
strictly positive derivative, then it must be strictly monotone increasing: 

Proposition 10.3.3. Let a < b, and let f : [a, b\ R be a differentiable 
function. If f{x) > 0 for all x € [a, b\, then f is strictly monotone 
increasing. If f'(x) < 0 for all x € [a, b\, then f is strictly monotone 
decreasing. If fix) = 0 for all x € [a, b\, then f is a constant function. 

Proof. See Exercise 10.3.4. □ 


— Exercises — 

Exercise 10.3.1. Prove Proposition 10.3.1. 

Exercise 10.3.2. Give an example of a function / : (— 1, 1) — > R which is con- 
tinuous and monotone increasing, but which is not differentiable at 0. Explain 
why this does not contradict Proposition 10.3.1. 

Exercise 10.3.3. Give an example of a function / : R — > R which is strictly 
monotone increasing and differentiable, but whose derivative at 0 is zero. Ex- 
plain why this does not contradict Proposition 10.3.1 or Proposition 10.3.3. 
(Hint: look at Exercise 10.2.3.) 

Exercise 10.3.4. Prove Proposition 10.3.3. (Hint: you do not have integrals 
or the fundamental theorem of calculus yet, so these tools cannot be used. 
However, one can proceed via the mean-value theorem, Corollary 10.2.9.) 

Exercise 10.3.5. Give an example of a subset X C R and a function / : X — > R 
which is differentiable on X , is such that f'(x) > 0 for all x £ X, but / is not 
strictly monotone increasing. (Hint: the conditions here are subtly different 
from those in Proposition 10.3.3. What is the difference, and how can one 
exploit that difference to obtain the example?) 


10.4 Inverse functions and derivatives 

We now ask the following question: if we know that a function / : 
X — >• Y is differentiable, and it has an inverse Z” 1 : Y — >• X, what 
can we say about the differentiability of / _1 ? This will be useful for 
many applications, for instance if we want to differentiate the function 
f(x) := x l / n . 
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We begin with a preliminary result. 


Lemma 10.4.1. Let f : X —>Y be an invertible function, with inverse 
/ -1 : Y X. Suppose that xq € X and yo £ Y are such that yo = f(x o) 
( which also implies that xq = / _1 (y o))- If f is differentiable at xq, and 
/ -1 is differentiable at yo, then 


CT 1 )'^) 


i 

f'(x o)‘ 


Proof. From the chain rule (Theorem 10.1.15) we have 


(/ 1 ° f)'{x 0 ) = (/ 1 ) / (yo)/ , (^o)- 


But / 1 o / is the identity function on X , and hence by Theorem 
10.1.13(b) (/ -1 o f)'{x o) = 1. The claim follows. □ 


As a particular corollary of Lemma 10.4.1, we see that if / is dif- 
ferentiable at xo with f(xo) = 0, then / -1 cannot be differentiable at 
yo = f{x o), since 1/ f'(xo) is undefined in that case. Thus for instance, 
the function g : [0, oo) — >• [0, oo) defined by g{y) := y 1 / 3 cannot be differ- 
entiable at 0, since this function is the inverse g = f~ 1 of the function 
/ : [0,oo) — >• [0, oo) defined by f(x) := x 3 , and this function has a 
derivative of 0 at / -1 ( 0) = 0. 

If one writes y = f(x), so that x = / _1 (y), then one can write 
the conclusion of Lemma 10.4.1 in the more appealing form dx/dy = 
1 /(dy/dx). However, as mentioned before, this way of writing things, 
while very convenient and easy to remember, can be misleading and 
cause errors if applied too carelessly (especially when one begins to work 
in the calculus of several variables). 

Lemma 10.4.1 seems to answer the question of how to differentiate 
the inverse of a function, however it has one significant drawback: the 
lemma only works if one assumes a priori that f~ 1 is differentiable. 
Thus, if one does not already know that /~ 3 is differentiable, one cannot 
use Lemma 10.4.1 to compute the derivative of / -1 . 

However, the following improved version of Lemma 10.4.1 will com- 
pensate for this fact, by relaxing the requirement on / -1 from differen- 
tiability to continuity. 

Theorem 10.4.2 (Inverse function theorem). Let f : X — >• Y be an 
invertible function, with inverse f ~ 1 : Y — >• X . Suppose that xq £ X 
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and yo £ Y are such that f(x o) = yo- If f is differentiable at xq, f 1 is 
continuous at yo, and f'(x o) f 0, then / _1 is differentiable at yo and 


ir'nvo) 


i 

f'(x o)' 


Proof. We have to show that 


lim 

y->yo;y£Y-{yo} 


f 1 (y)-f 1 {yo) 
y-yo 


1 

/' (*o)‘ 


By Proposition 9.3.9, it suffices to show that 

Um / 1 (j/n) ~ / 1 (2/o) = 1 

n^oo y n - y 0 /'(x 0 ) 

for any sequence (y n )ff =1 of elements in Y — {y 0 } which converge to yo- 
To prove this, we set x n := f^ 1 (y n ). Then (x n )ff =1 is a sequence of 
elements in X — {xo}. (Why? Note that f~ 1 is a bijection) Since / _1 
is continuous by assumption, we know that x n = f~ 1 (y n ) converges to 
/ _1 (yo) = xo as n — >• oo. Thus, since / is differentiable at xq, we have 
(by Proposition 9.3.9 again) 


lim 

n— »■ oo 


f(Xn) ~ f(x q) 


f'(x 0 ). 


But since x n f xo and / is a bijection, the fraction is non- 

Xn Xq 

zero. Also, by hypothesis f'(x o) is non-zero. So by limit laws 

x n - Xq _ 1 

° f(x n ) ~ f(x o) f(x 0 ) ' 

But since x n = / _1 (y n ) and xq = f~ 1 {yo ), we thus have 


lim 

n— » oo 


/ 1 (y n )-f Hyo) 

yn - 2/0 


1 

/'(®o) 


as desired. □ 

We give some applications of the inverse function theorem in the 
exercises below. 
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— Exercises — 

Exercise 10.4.1. Let n > 1 be a natural number, and let g : (0, oo) — > (0,oo) 
be the function g(x) := x 1//n . 

(a) Show that g is continuous on (0, oo). (Hint: use Proposition 9.8.3.) 

(b) Show that g is differentiable on (0,oo), and that g'(x) = for all 

x £ (0, oo). (Hint: use the inverse function theorem and (a).) 

Exercise 10.4.2. Let q be a rational number, and let / : (0, oo) — > R be the 
function f{x) = x q . 

(a) Show that / is differentiable on (0,oo) and that f(x) = qx q ~ l . (Hint: 
use Exercise 10.4.1 and the laws of differential calculus in Theorem 
10.1.13 and Theorem 10.1.15.) 

(b) Show that lim I _ >1;l£ (o j00 ) = 4 f° r every rational number q. (Hint: 

use part (a) and Definition 10.1.1. An alternate route is to apply 
L’Hopital’s rule from the next section.) 

Exercise 10.4.3. Let a be a real number, and let / : (0, oo) — > R be the function 
f{x) = x a . 

(a) Show that lim z ,_ >1;a , 6 ( 0jOO )\{i} = ot. (Hint: use Exercise 10.4.2 

and the comparison principle; you may need to consider right and left 
limits separately. Proposition 5.4.14 may also be helpful.) 

(b) Show that / is differentiable on (0, oo) and that f'(x) = ar“ _1 . (Hint: 
use (a), exponent laws (Proposition 6.7.3), and Definition 10.1.1.) 

10.5 L’Hopital’s rule 

Finally, we present a version of a rule you are all familiar with. 

Proposition 10.5.1 (L’Hopital’s rule I). Let X be a subset of R ; let 
f : X -+ R and g : X — >• R be functions, and let xo £ X be a limit 
point of X . Suppose that f(x o) = g(x o) = 0, that f and g are both 
differentiable at xq, but g'(x o) f 0. Then there exists a 6 > 0 such that 
g(x) / 0 for all (xo — 6 , xo + 5)) — {•'Co}? and 

Bn. dd = ddd. 

i^a:o;a:e(.Yn(a:o-i5,a:o+<5))-{a;o} g{x) g' {x o) 

Proof. See Exercise 10.5.1. □ 

The presence of the d here may seem somewhat strange, but is needed 
because g{x) might vanish at some points other than xq, which would 
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imply that quotient is not necessarily defined at all points in X — 
{xo}. 

A more sophisticated version of L’Hopital’s rule is the following. 

Proposition 10.5.2 (L’Hopital’s rule II). Let a < b be real numbers, 
let f : [a, b] — >• R and g : [a, 5] — * R be functions which are differentiable 
on [a, b\. Suppose that f(a) = g(a ) = 0, that g' is non-zero on [a, b] (i.e., 
g'(x) 0 for all x € [a, b]), and lini E _ s . a;a , g ( afe ] exists and equals L. 

Then g(x) 0 for all x € (a, b], and lim x _ >0;a . 6 ( 0) { ) ] exists and equals 

L. 

Remark 10.5.3. This proposition only considers limits to the right of 
a, but one can easily state and prove a similar proposition for limits to 
the left of a, or around both sides of a. Speaking very informally, the 
proposition states that 


lim 

x^a 


f(x) 

g(x) 


lim 

x^a 


f (x) 

g\x) 1 


though one has to ensure all of the conditions of the proposition hold (in 
particular, that /(a) = g(a) = 0, and that the right-hand limit exists), 
before one can apply L’Hopitahs rule. 


Proof. (Optional) We first show that g(x) f 0 for all x € (a, b\. Suppose 
for sake of contradiction that g(x) = 0 for some x € (a, b]. But since 
g(a) is also zero, we can apply Rolle’s theorem to obtain g'[y) = 0 for 
some a < y < x, but this contradicts the hypothesis that g' is non-zero 
on [a, b] . 

Now we show that lim. c _ s . a;3 . g ( aife ] = L. By Proposition 9.3.9, it 
will suffice to show that 


n— »oo g( Xn ) 

for any sequence (i„)“ =1 taking values in (a, b] which converges to x. 

Consider a single x n , and consider the function h n : [a, x n ] — >• R 
defined by 

K{x) ■■= f(x)g(x n ) - g{x)f(x n ). 

Observe that h n is continuous on [a, x n ] and equals 0 at both a and 
x n , and is differentiable on ( a,x n ) with derivative h' n (x) = f'(x)g(x n ) — 
g' (x)f(x n ). (Note that f(x n ) and g(x n ) are constants with respect to 
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x .) By Rolle’s theorem (Theorem 10.2.7), we can thus find y n € (a,x n ) 
such that h' n (y n ) = 0, which implies that 

f(x n ) = f'iVn) 

g{x n ) g'bln)' 

Since y n € (a,x n ) for all n, and x n converges to a as n — >• oo, we see 
from the squeeze test (Corollary 6.4.14) that y n also converges to a as 
n — >• oo. Thus f~rnf\ converges to L, and thus also converges to L, 
as desired. □ 


— Exercises — 

Exercise 10.5.1. Prove Proposition 10.5.1. (Hint: to show that g(x) f 0 near 
Xq, you may wish to use Newton’s approximation (Proposition 10.1.7). For the 
rest of the proposition, use limit laws, Proposition 9.3.14.) 

Exercise 10.5.2. Explain why Exercise 1.2.12 does not contradict either of the 
propositions in this section. 
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The Riemann integral 


In the previous chapter we reviewed differentiation - one of the two pil- 
lars of single variable calculus. The other pillar is, of course, integration , 
which is the focus of the current chapter. More precisely, we will turn 
to the definite integral , the integral of a function on a fixed interval, as 
opposed to the indefinite integral , otherwise known as the antiderivative. 
These two are of course linked by the Fundamental theorem of calculus, 
of which more will be said later. 

For us, the study of the definite integral will start with an interval I 
which could be open, closed, or half-open, and a function /:/—>• R, and 
will lead us to a number j r /; we can write this integral as J T f(x) dx 
(of course, we could replace x by any other dummy variable), or if / has 
endpoints a and b, we shall also write this integral as f ( ‘ f or f(x) dx. 

To actually define this integral fj f is somewhat delicate (especially 
if one does not want to assume any axioms concerning geometric notions 
such as area), and not all functions / are integrable. It turns out that 
there are at least two ways to define this integral: the Riemann inte- 
gral, named after Georg Riemann (1826-1866), which we will do here and 
which suffices for most applications, and the Lebesgue integral, named af- 
ter Henri Lebesgue (1875-1941), which supercedes the Riemann integral 
and works for a much larger class of functions. The Lebesgue integral 
will be constructed in Chapter 11.45. There is also the Riemann- Steiltjes 
integral f 7 f(x) da(x ), a generalization of the Riemann integral due to 
Thomas Stieltjes (1856-1894), which we will discuss in Section 11.8. 

Our strategy in defining the Riemann integral is as follows. We begin 
by first defining a notion of integration on a very simple class of functions 
- the piecewise constant functions. These functions are quite primitive, 
but their advantage is that integration is very easy for these functions, 
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as is verifying all the usual properties. Then, we handle more general 
functions by approximating them by piecewise constant functions. 

11.1 Partitions 

Before we can introduce the concept of an integral, we need to describe 
how one can partition a large interval into smaller intervals. In this 
chapter, all intervals will be bounded intervals (as opposed to the more 
general intervals defined in Definition 9.1.1). 

Definition 11.1.1. Let X be a subset of R. We say that X is connected 
iff the following property is true: whenever x, y are elements in X such 
that x < y, the bounded interval [x, y] is a subset of X (i.e., every 
number between x and y is also in A'). 

Remark 11.1.2. Later on, in Section 11.11 we will define a more general 
notion of connectedness, which applies to any metric space. 

Examples 11.1.3. The set [1, 2] is connected, because if x < y both lie 
in [1,2], then 1 < x < y < 2, and so every element between x and y also 
lies in [1,2]. A similar argument shows that the set (1,2) is connected. 
However, the set [1,2] U [3,4] is not connected (why?). The real line is 
connected (why?). The empty set, as well as singleton sets such as {3}, 
are connected, but for rather trivial reasons (these sets do not contain 
two elements x, y for which x < y). 

Lemma 11.1.4. Let X be a subset of the real line. Then the following 
two statements are logically equivalent: 

(a) X is bounded and connected. 

( b ) X is a bounded interval. 

Proof. See Exercise 11.1.1. □ 

Remark 11.1.5. Recall that intervals are allowed to be singleton points 
(e.g., the degenerate interval [2,2] = {2}), or even the empty set. 

Corollary 11.1.6. If I and J are bounded intervals, then the intersec- 
tion I n J is also a bounded interval. 

Proof. See Exercise 11.1.2. □ 
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Example 11.1.7. The intersection of the bounded intervals [2,4] and 
[4,6] is {4}, which is also a bounded interval. The intersection of (2,4) 
and (4, 6) is 0. 

We now give each bounded interval a length. 

Definition 11.1.8 (Length of intervals). If I is a bounded interval, we 
define the length of /, denoted ]/] as follows. If I is one of the intervals 
[ a,b ], (a, b), [a, b), or (a, b\ for some real numbers a < b, then we define 
]/] := b — a. Otherwise, if I is a point or the empty set, we define ]/] = 0. 

Example 11.1.9. For instance, the length of [3, 5] is 2, as is the length 
of (3, 5); meanwhile, the length of {5} or the empty set is 0. 

Definition 11.1.10 (Partitions). Let I be a bounded interval. A parti- 
tion of I is a finite set P of bounded intervals contained in /, such that 
every x in / lies in exactly one of the bounded intervals J in P. 

Remark 11.1.11. Note that a partition is a set of intervals, while each 
interval is itself a set of real numbers. Thus a partition is a set consisting 
of other sets. 

Examples 11.1.12. The set P = {{1}, (1, 3), [3, 5), {5}, (5, 8], 0} of 
bounded intervals is a partition of [1,8], because all the intervals in 
P lie in [1, 8], and each element of [1, 8] lies in exactly one interval in P. 
Note that one could have removed the empty set from P and still obtain 
a partition. However, the set {[1, 4], [3, 5]} is not a partition of [1,5] 
because some elements of [1, 5] are included in more than one interval in 
the set. The set {(1,3), (3,5)} is not a partition of (1,5) because some 
elements of (1,5) are not included in any interval in the set. The set 
{(0,3), [3,5)} is not a partition of (1,5) because some intervals in the 
set are not contained in (1,5). 

Now we come to a basic property about length: 

Theorem 11.1.13 (Length is finitely additive). Let I be a bounded in- 
terval, n be a natural number, and let P be a partition of I of cardinality 
n. Then 

JeP 

Proof. We prove this by induction on n. More precisely, we let P(n) be 
the property that whenever I is a bounded interval, and whenever P is 
a partition of I with cardinality n, that [/[ = ]Cj eP 1^1- 
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The base case P( 0) is trivial; the only way that I can be partitioned 
into an empty partition is if I is itself empty (why?), at which point the 
claim is easy. The case P(l) is also very easy; the only way that I can 
be partitioned into a singleton set { J} is if J = I (why?), at which point 
the claim is again very easy. 

Now suppose inductively that P{n ) is true for some n > 1, and now 
we prove P{n + 1). Let I be a bounded interval, and let P be a partition 
of I of cardinality n + 1 . 

If I is the empty set or a point, then all the intervals in P must 
also be either the empty set or a point (why? ) , and so every interval has 
length zero and the claim is trivial. Thus we will assume that I is an 
interval of the form (a, 6), (a, b\, [a, 6), or [a, b\. 

Let us first suppose that b € 7, i.e. , 7 is either (a, b] or [ a,b ]. Since 
b € 7, we know that one of the intervals K in P contains b. Since K 
is contained in I, it must therefore be of the form (c, b], [c, b\, or {b} 
for some real number c, with a < c < b (in the latter case of K = {6}, 
we set c := b). In particular, this means that the set 7 — K is also an 
interval of the form [a,c], (a, c), (a, c], [a, c) when c > a, or a point or 
empty set when a = c. Either way, we easily see that 

|7| = \K\ + \I — K\. 

On the other hand, since P forms a partition of 7, we see that P — {K} 
forms a partition of I — K (why?). By the induction hypothesis, we thus 
have 

\i-k\= J2 \ J \- 

JeP-{K} 

Combining these two identities (and using the laws of addition for finite 
sets, see Proposition 7.1.11) we obtain 

m = El J l 

JeP 


as desired. 

Now suppose that 6 0 7, i.e., I is either (a, b) or [a, b). Then one of 
the intervals K also is of the form (c, b) or [c, b) (see Exercise 11.1.3). In 
particular, this means that the set I — K is also an interval of the form 
[a, c], (a, c), (a, c], [a, c) when c > a, or a point or empty set when a = c. 
The rest of the argument then proceeds as above. □ 



11.1. Partitions 


271 


There are two more things we need to do with partitions. One is to 
say when one partition is finer than another, and the other is to talk 
about the common refinement of two partitions. 

Definition 11.1.14 (Finer and coarser partitions). Let / be a bounded 
interval, and let P and P 7 be two partitions of I. We say that P 7 is finer 
than P (or equivalently, that P is coarser than P 7 ) if for every J in P 7 , 
there exists a K in P such that J C K. 

Example 11.1.15. The partition {[1,2), {2}, (2,3), [3,4]} is finer than 
{[1, 2], (2, 4]} (why?). Both partitions are finer than {[1,4]}, which is 
the coarsest possible partition of [1,4]. Note that there is no such thing 
as a “finest” partition of [1,4]. (Why? recall all partitions are assumed 
to be finite.) We do not compare partitions of different intervals, for 
instance if P is a partition of [1,4] and P 7 is a partition of [2, 5] then we 
would not say that P is coarser or finer than P 7 . 

Definition 11.1.16 (Common refinement). Let I be a bounded inter- 
val, and let P and P 7 be two partitions of I. We define the common 
refinement P#P 7 of P and P 7 to be the set 

P#P 7 := {K n J : K G P and J € P 7 }. 

Example 11.1.17. Let P := {[1,3), [3,4]} and P 7 := {[1,2], (2,4]} 
be two partitions of [1,4]. Then P#P 7 is the set {[1,2], (2,3), [3,4], 0} 
(why?). 

Lemma 11.1.18. Let I be a bounded interval, and let P and P 7 be two 
partitions of I . Then P#P 7 is also a partition of I, and is both finer 
than P and finer than P 7 . 

Proof. See Exercise 11.1.4. □ 


— Exercises — 

Exercise 11.1.1. Prove Lemma 11.1.4. (Hint: in order to show that (a) implies 
(b) in the case when X is non-empty, consider the supremum and infimum of 

X.) 

Exercise 11.1.2. Prove Corollary 11.1.6. (Hint: use Lemma 11.1.4, and explain 
why the intersection of two bounded sets is automatically bounded, and why 
the intersection of two connected sets is automatically connected.) 

Exercise 11.1.3. Let I be a bounded interval of the form I = (a, b) or I = [a, b) 
for some real numbers a < b. Let I \, . . . , I n be a partition of I. Prove that one 
of the intervals Ij in this partition is of the form Ij = (c, b) or Ij = [c, b) for 
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some a < c < b. (Hint: prove by contradiction. First show that if Ij is not of 
the form (c, b) or [c, b) for any a < c < b, then sup Ij is strictly less than b.) 

Exercise 11.1.4. Prove Lemma 11.1.18. 


11.2 Piecewise constant functions 


We can now describe the class of “simple” functions which we can inte- 
grate very easily. 

Definition 11.2.1 (Constant functions). Let X be a subset of R, and 
let / : X -> R be a function. We say that / is constant iff there exists a 
real number c such that f(x) = c for all i£l. If E is a subset of X, we 
say that / is constant on E if the restriction /|^ of / to E is constant, 
in other words there exists a real number c such that f(x) = c for all 
x € E. We refer to c as the constant value of / on E. 

Remark 11.2.2. If E is a non-empty set, then a function / which is 
constant on E can have only one constant value; it is not possible for a 
function to always equal 3 on E while simultaneously always equalling 
4. However, if E is empty, every real number c is a constant value for / 
on E (why?). 

Definition 11.2.3 (Piecewise constant functions I). Let I be a bounded 
interval, let / : I — >• R be a function, and let P be a partition of /. We 
say that / is piecewise constant with respect to P if for every J € P, / 
is constant on J. 


Example 11.2.4. The function / : [1,6] — >• R defined by 


/(*) = 



if 1 < x < 3 
if x = 3 
if 3 < x < 6 
if x = 6 


is piecewise constant with respect to the partition {[1,3), {3}, (3,6), 
{6}} of [1,6]. Note that it is also piecewise constant with respect to 
some other partitions as well; for instance, it is piecewise constant with 
respect to the partition {[1,2), {2}, (2,3), {3}, (3,5), [5,6), {6} , 0} . 

Definition 11.2.5 (Piecewise constant functions II). Let I be a 
bounded interval, and let / : I — >• R be a function. We say that / 
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is piecewise constant on I if there exists a partition P of I such that / 
is piecewise constant with respect to P. 

Example 11.2.6. The function used in the previous example is piece- 
wise constant on [1,6]. Also, every constant function on a bounded 
interval I is automatically piecewise constant (why?). 

Lemma 11.2.7. Let I be a bounded interval, let P be a partition of I , 
and let f : I R be a function which is piecewise constant with respect 
to P. Let P' be a partition of I which is finer than P. Then f is also 
piecewise constant with respect to Ph 

Proof. See Exercise 11.2.1. □ 

The space of piecewise constant functions is closed under algebraic 
operations: 

Lemma 11.2.8. Let I be a bounded interval, and let f : I R and 
g : L — »• R be piecewise constant functions on I. Then the functions 
f + g, f — g, rna x(/, g) and fg are also piecewise constant functions on 
I. Here of course max(/, g) : I — >• R is the function max(/, g)(x) := 
ma x(f(x),g(x)). If g does not vanish anywhere on I ( i.e ., g[x) / 0 for 
all x € I) then f / g is also a piecewise constant function on I. 

Proof. See Exercise 11.2.2. □ 


We are now ready to integrate piecewise constant functions. We be- 
gin with a temporary definition of an integral with respect to a partition. 


Definition 11.2.9 (Piecewise constant integral I). Let I be a bounded 
interval, let P be a partition of /. Let / : / — >• R be a function which 
is piecewise constant with respect to P. Then we define the piecewise 
constant integral p.c. Jj p j / of / with respect to the partition P by the 
formula 


p.c. 


f--= E c ^i J i’ 

JeP 


where for each J in P, we let cj be the constant value of / on J. 


Remark 11.2.10. This definition seems like it could be ill-defined, be- 
cause if J is empty then every number cj can be the constant value of 
/ on J, but fortunately in such cases |J| is zero and so the choice of 
cj is irrelevant. The notation p.c. J™ / is rather artificial, but we shall 
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only need it temporarily, en route to a more useful definition. Note that 
since P is finite, the sum ^j gP cj\J\ is always well-defined (it is never 
divergent or infinite). 

Remark 11.2.11. The piecewise constant integral corresponds intu- 
itively to one’s notion of area, given that the area of a rectangle ought 
to be the product of the lengths of the sides. (Of course, if / is negative 
somewhere, then the “area” cj\J\ would also be negative.) 

Example 11.2.12. Let / : [1,4] — x R be the function 

[ 2 if 1 < x < 3 
f(x) = < 4 if x = 3 

^ 6 if 3 < x < 4 

and let P := {[1, 3), {3}, (3, 4]}. Then 

P- c ■ [ f = C [1,3)|[1> 3)1 + C{ 3 j|{3}| + C( 3)4 ] | (3, 4] | 

J[ P] 

=2x2+4x016x1 
= 10 . 

Alternatively, if we let P' := {[1, 2), [2, 3), {3}, (3, 4], 0} then 

P- c ■ [ f = c [i,2)|[1>2)| + C[2,3) I [2, 3)1 + C{ 3 }|{3}| 

TP'] 

+ c (3,4]I(3, 4] | + C0|0| 

= 2xl + 2xl + 4x0 + 6xl + cjx0 
= 10 . 

This example suggests that this integral does not really depend on 
what partition you pick, so long as your function is piecewise constant 
with respect to that partition. That is indeed true: 

Proposition 11.2.13 (Piecewise constant integral is independent of 
partition). Let I he a bounded interval, and let f : I — >• R be a func- 
tion. Suppose that P and P' are partitions of I such that f is piece- 
wise constant both with respect to P and with respect to P'. Then 
P- c ■ Jjp] / = P- c ■ J[p'] /• 

Proof. See Exercise 11.2.3. 


□ 
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Because of this proposition, we can now make the following defini- 
tion: 

Definition 11.2.14 (Piecewise constant integral II). Let I be a bounded 
interval, and let / : I — > R be a piecewise constant function on I. We 
define the piecewise constant integral p.c. fj f by the formula 


p.c. f ■= p.c. / /, 

Ji J[ P] 

where P is any partition of I with respect to which / is piecewise con- 
stant. (Note that Proposition 11.2.13 tells us that the precise choice of 
this partition is irrelevant.) 

Example 11.2.15. If / is the function given in Example 11.2.12, then 

P- c ■ f[ 1 , 4 ] / = 10 - 

We now give some basic properties of the piecewise constant integral. 
These laws will eventually be superceded by the corresponding laws for 
the Riemann integral (Theorem 11.4.1). 

Theorem 11.2.16 (Laws of integration). Let I be a bounded interval, 
and let f : I R and g : I — >• R be piecewise constant functions on I . 

(a) We have p.c. Jj(f + g) = p.c. f T f + p.c. f f g. 

(b) For any real number c, we have p.c. Jj(cf) = c(p.c. fj /). 

(c) We have p.c. Jj(f - g) = p.c. J f f - p.c. f f g. 

(d) If f(x) > 0 for all x € I, then p.c. f r f > 0 . 

(e) If f(x) > g(x) for all x € I , then p.c. f I f > p.c. f T g. 

(/) If f is constant function f(x) = c for all x in I, then p.c. j I f = 

c\I\- 

( g ) Let J be a bounded interval containing I (i.e., I C J), and let 
F : J — > R be the function 


F(x) 


f(x) ifx€l 
0 if x (f I 


Then F is piecewise constant on J , and p.c. fj F = p.c. f 7 f . 
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(. h ) Suppose that {J, K\ is a partition of I into two intervals J and K . 
Then the functions f\j : J — >• R and f\x '■ K — > R are piecewise 
constant on J and K respectively, and we have 


p.c. / f = p.c. / f\j+p.c. / / 1 


K ■ 


IK 


Proof. See Exercise 11.2.4. □ 

This concludes our integration of piecewise constant functions. We 
now turn to the question of how to integrate bounded functions. 

— Exercises — 

Exercise 11.2.1. Prove Lemma 11.2.7. 

Exercise 11.2.2. Prove Lemma 11.2.8. (Hint: use Lemmas 11.1.18 and 11.2.7 
to make f and g piecewise constant with respect to the same partition of I.) 

Exercise 11.2.3. Prove Proposition 11.2.13. (Hint: first use Theorem 11.1.13 
to show that both integrals are equal to p.c. J| p ^ p( - /.) 

Exercise 11.2.4. Prove Theorem 11.2.16. (Hint: you can use earlier parts of 
the theorem to prove some of the later parts of the theorem. See also the hint 
to Exercise 11.2.2.) 


11.3 Upper and lower Riemann integrals 

Now let / : I — >• R be a bounded function defined on a bounded interval 
I. We want to define the Riemann integral Jj f . To do this we first need 
to define the notion of upper and lower Riemann integrals Jjf and J f. 
These notions are related to the Riemann integral in much the same way 
that the lim sup and lim inf of a sequence are related to the limit of that 
sequence. 

Definition 11.3.1 (Majorization of functions). Let /:/—>• R and 
g : I — >• R. We say that g majorizes f on I if we have g(x) > f(x) for 
all x G I, and that g minorizes f on I if g[x) < f(x) for all x € I. 

The idea of the Riemann integral is to try to integrate a function 
by first majorizing or minorizing that function by a piecewise constant 
function (which we already know how to integrate). 
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Definition 11.3.2 (Upper and lower Riemann integrals). Let / : I — > R 
be a bounded function defined on a bounded interval I. We define the 
upper Riemann integral f T f by the formula 

J f := inf {p.c. J g : g is a p.c. function on I which majorizes /} 

and the lower Riemann integral f f by the formula 

g : g is a p.c. function on I which minorizes /}. 

We give a crude but useful bound on the lower and upper integral: 

Lemma 11.3.3. Let f : I R be a function on a bounded interval I 
which is bounded by some real number M , i.e., —M < f(x) < M for all 
x € I. Then we have 

~M\I\ < J f< jf<M\I\. 

In particular, both the lower and upper Riemann integrals are real num- 
bers (i.e., they are not infinite). 

Proof. The function g : I R defined by g{x) = M is constant, hence 
piecewise constant, and majorizes /; thus j T f < p.c. fj g = M\I\ by 
definition of the upper Riemann integral. A similar argument gives 
— M\I\ < f f. Finally, we have to show that f f < fjf . Let g be 
any piecewise constant function majorizing /, and let h be any piece- 
wise constant function minorizing /. Then g majorizes h, and hence 
p.c. fj h < p.c. fj g. Taking suprema in h, we obtain that J f < p.c. fj g. 

Taking infima in g, we thus obtain f f < fjg, as desired. □ 

We now know that the upper Riemann integral is always at least as 
large as the lower Riemann integral. If the two integrals match, then we 
can define the Riemann integral: 

Definition 11.3.4 (Riemann integral). Let /:/—>• R be a bounded 
function on a bounded interval /. If f f = f jf, then we say that / is 
Riemann integrable on I and define 


/ := sup{p.c. 


■I— i 
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If the upper and lower Riemann integrals are unequal, we say that / is 
not Riemann integrable. 

Remark 11.3.5. Compare this definition to the relationship between 
the lim sup, lim inf, and limit of a sequence a n that was established in 
Proposition 6.4.12(f); the lim sup is always greater than or equal to the 
lim inf, but they are only equal when the sequence converges, and in this 
case they are both equal to the limit of the sequence. The definition given 
above may differ from the definition you may have encountered in your 
calculus courses, based on Riemann sums. However, the two definitions 
turn out to be equivalent; this is the purpose of the next section. 

Remark 11.3.6. Note that we do not consider unbounded functions to 
be Riemann integrable; an integral involving such functions is known as 
an improper integral. It is possible to still evaluate such integrals using 
more sophisticated integration methods (such as the Lebesgue integral) ; 
we shall do this in Chapter 11.45. 

The Riemann integral is consistent with (and supercedes) the piece- 
wise constant integral: 

Lemma 11.3.7. Let f : I — >• R be a piecewise constant function on a 
bounded interval I . Then f is Riemann integrable, and Jj f = P-c. / 7 /. 

Proof. See Exercise 11.3.3. □ 

Remark 11.3.8. Because of this lemma, we will not refer to the piece- 
wise constant integral p.c. again, and just use the Riemann integral f r 
throughout (until this integral is itself superceded by the Lebesgue inte- 
gral in Chapter 11.45). We observe one special case of Lemma 11.3.7: if 
/ is a point or the empty set, then f r f = (1 for all functions /:/—>• R. 
(Note that all such functions are automatically constant.) 

We have just shown that every piecewise constant function is Rie- 
mann integrable. However, the Riemann integral is more general, and 
can integrate a wider class of functions; we shall see this shortly. For 
now, we connect the Riemann integral we have just defined to the con- 
cept of a Riemann sum , which you may have seen in other treatments 
of the Riemann integral. 

Definition 11.3.9 (Riemann sums). Let /:/—>• R be a bounded 
function on a bounded interval /, and let P be a partition of I. We 
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define the upper Riemann sum U(f, P) and the lower Riemann sum 
L(f, P)by " 

U(f, P):= E (sup/(s))|J| 

JeP:J^0 xeJ 

and 

L(f, P):= E ( in W)l J l- 

' X&J 

JGP:J^0 

Remark 11.3.10. The restriction J 0 is required because the quan- 
tities inf^gj/^x) and sup xe j f{x) are infinite (or negative infinite) if J 
is empty. 

We now connect these Riemann sums to the upper and lower Rie- 
mann integral. 

Lemma 11.3.11. Let f : I — >• R be a bounded function on a bounded in- 
terval I, and let g be a function which majorizes f and which is piecewise 
constant with respect to some partition P of I. Then 

p.c.Jg>U(f, P). 

Similarly, if h is a function which minorizes f and is piecewise constant 
with respect to P, then 


p.c.Jh<L(f, P). 

Proof. See Exercise 11.3.4. □ 

Proposition 11.3.12. Let f : I — >• R be a bounded function on a 
bounded interval I . Then 



= inf {£/(/, P) : P is a partition of 1} 


and 



sup{L(/, P) : P is a partition of 1} 


Proof. See Exercise 11.3.5. 


□ 
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— Exercises — 

Exercise 11.3.1. Let /:/—>■ R, g : / — » R, and h : I — > R be functions. Show 
that if / majorizes g and g majorizes h 1 then / majorizes h. Show that if / 
and g majorize each other, then they must be equal. 

Exercise 11.3.2. Let / : I — ► R, g : I — > R, and h : I — > R be functions. If / 
majorizes g , is it true that f + h majorizes g + hi Is it true that / • h majorizes 
g • hi If c is a real number, is it true that cf majorizes cgl 

Exercise 11.3.3. Prove Lemma 11.3.7. 

Exercise 11.3.4. Prove Lemma 11.3.11. 

Exercise 11.3.5. Prove Proposition 11.3.12. (Hint: you will need Lemma 
11.3.11, even though this Lemma will only do half of the job.) 


11.4 Basic properties of the Riemann integral 

Just as we did with limits, series, and derivatives, we now give the basic 
laws for manipulating the Riemann integral. These laws will eventu- 
ally be superceded by the corresponding laws for the Lebesgue integral 
(Proposition 11.48.3). 

Theorem 11.4.1 (Laws of Riemann integration). Let I be a bounded 
interval, and let f : I R and g : I — > R be Riemann integrable 
functions on I. 

(a) The function f + g is Riemann integrable, and we have fjif + g) = 

Sr f + Jr 9- 

( b ) For any real number c, the function cf is Riemann integrable, and 
we have Jj(cf) = c(/ 7 /). 

(c) The function f — g is Riemann integrable, and we have Jj(f-g) = 

Si f ~ Si 9- 

( d ) If f(x) > 0 for all x € I, then J f f > 0. 

(e) If f{x) > g{x) for all x G /, then f r f > f \g. 

(/) U f I s th e constant function f(x) = c for all x in I, then fjf = 
c\I\. 
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( g ) Let J be a bounded interval containing I (■ i.e I C J), and let 
F : J — >• R be the function 


F(x) 


f(x) ifxel 
0 if x fLl 


Then F is Riemann integrable on J, and fjF = j r f . 


(h) Suppose that { J , K} is a partition of I into two intervals J and K . 
Then the functions f\j : J — >• R and f\x ■ K — >• R are Riemann 
integrable on J and I\ respectively, and we have 


f = 


f\j+ [ . 
Jk 


f I 


K ■ 


Proof. See Exercise 11.4.1. □ 

Remark 11.4.2. We often abbreviate Jj f\j as fjf, even though / is 
really defined on a larger domain than just J . 

Theorem 11.4.1 asserts that the sum or difference of any two Rie- 
mann integrable functions is Riemann integrable, as is any scalar mul- 
tiple cf of a Riemann integrable function /. We now give some further 
ways to create Riemann integrable functions. 

Theorem 11.4.3 (Max and min preserve integr ability). Let I be a 
bounded interval, and let f : I R and g : I — >• R be a Rie- 
mann integrable function. Then the functions max(/, g) : I — >• R and 
min (f,g) : I — >• R defined by ma x(f,g)(x) := ma x(f(x),g(x)) and 
min (f,g)(x) := min (f(x),g(x)) are also Riemann integrable. 

Proof. We shall just prove the claim for ma x(/, g), the case of min(/, g) 
being similar. First note that since / and g are bounded, then ma x(/, g) 
is also bounded. 

Let £ > 0. Since j f f = J /, there exists a piecewise constant 
function /:/—>• R which minorizes f on I such that 


/> f~e. 


Similarly we can find a piecewise constant g : I — >• R which minorizes g 
on I such that 
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and we can find piecewise functions /, g which majorize /, g respectively 
on / such that 


f< f + e 


and 



g + e. 


In particular, if h : I — > R denotes the function 


h ■= (/-/) + {g - g ) 


we have 


h < 4e. 


On the other hand, max(/, g ) is a piecewise constant function on I 
(why?) which minorizes max(/, g) (why?), while ma x(f,g) is similarly 
a piecewise constant function on I which majorizes max(/, g). Thus 


max(/, g) < J max(f,g)< J ^ max(/, g) < max(/, g), 


and so 


0<f i max(/, g) - J ma x(/, g) < max(/, g) - max(/, g). 


But we have 


and similarly 


and thus 


f(x) = f(x) + (/ - f)(x) < f{x) + h(x) 


g(x) = g{x) + {g- g){x) < g(x) + h{x) 


max(/ (x),g(x)) < ma x(f(x),g(x)) + h(x). 
Inserting this into the previous inequality, we obtain 

0 < J i ma x(/, g)- J ma x(/, g) < J^h < 4s. 
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To summarize, we have shown that 


0 < 


ma x(f,g) - / ma x(f,g) < 4e 
J—i 


for every e. Since f r ma x(/, g ) — f ma x(/, g) does not depend on e, we 
thus see that 


max(/, g) - J ma x(/, 5 ) = 0 


./ / 

and hence that max(/, g) is Riemann integrable. 


□ 


Corollary 11.4.4 (Absolute values preserve Riemann integrability) . 
Let I he a bounded interval. If / : / — > R is a Riemann integrable 
function, then the positive part /+ := max(/, 0) and the negative part 
f- := min(/, 0) are also Riemann integrable on I. Also, the absolute 
value I/I =/+ — /_ is also Riemann integrable on I . 

Theorem 11.4.5 (Products preserve Riemann integrability). Let I be 
a bounded interval. If f : I R and g : I R are Riemann integrable, 
then fg : I — > R is also Riemann integrable. 

Proof. This one is a little trickier. We split / = /+ + /_ and g = g+ + g~ 
into positive and negative parts; by Corollary 11.4.4, the functions /+, 
/_, g + , g _ are Riemann integrable. Since 


fg = f+9+ + f+9- + f-g+ + f-g- 


then it suffices to show that the functions f + g + ,f + g-, f~g+, f~g~ are 
individually Riemann integrable. We will just show this for f+g+', the 
other three are similar. 

Since f + and g + are bounded and positive, there are Mi, M 2 > 0 
such that 

0 < f+(x ) < M\ and 0 < g+(x) < M 2 

for all x € I. Now let e > 0 be arbitrary. Then, as in the proof of 
Theorem 11.4.3, we can find a piecewise constant function /+ minorizing 
/ + on I, and a piecewise constant function /+ majorizing /+ on I, such 
that 
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and 

IA-I, u ~ £ - 

Note that /+ may be negative at places, but we can fix this by replacing 
/_)_ by max(/ + ,0), since this still minorizes /+ (why?) and still has 
integral greater than or equal to f f /+ — e (why?). So without loss of 
generality we may assume that f+(x) > 0 for all x € /. Similarly we 
may assume that f+(x) < M\ for all x € /; thus 

0 < /f (a;) < f + (x) < f+(x) < Mi 


for all x € /. 

Similar reasoning allows us to find piecewise constant g _|_ minorizing 
g+, and g+ majorizing g + , such that 



and 

0 < g+(x) < g+(x) < gTix) < M 2 

for all x € /. 

Notice that f+g+ is piecewise constant and minorizes f+g+, while 
f+g+ is piecewise constant and majorizes f+g+- Thus 

0 < J J+9+ ~ J f+9+ < Jt f+9+ ~ f+9+ ■ 

However, we have 

f+{x)g+{x) - f+ (x)g ± (x) = f + (x)(gT - g+)(x) + g+(x)(f+ - f+(x)) 

< Mi(gT - g±)(x) + M 2 (/+ - U(x)) 
for all x € I, and thus 

o< J ^f+g+ — J f+g+ < M\ J^{s+ — 9+) + m 2 J(f+ - f±) 


< Mi(2e) + M 2 (2e). 
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Again, since e was arbitrary, we can conclude that f+g+ is Riemann 
integrable, as before. Similar argument show that f+g~, f~g+, f~g - 
are Riemann integrable; combining them we obtain that f g is Riemann 
integrable. □ 


— Exercises — 

Exercise 11.4.1. Prove Theorem 11.4.1. (Hint: you may find Theorem 11.2.16 
to be useful. For part (b): First do the case c > 0. Then do the case c = — 1 
and c = 0 separately. Using these cases, deduce the case of c < 0. You can use 
earlier parts of the theorem to prove later ones.) 

Exercise 11.4.2. Let a < b be real numbers, and let / : [a, b] — > R be a 
continuous, non-negative function (so f(x) > 0 for all x £ [a, 6]). Suppose 
that L,/ = 0. Show that f(x) = 0 for all x £ [a, b\. (Hint: argue by 
contradiction.) 

Exercise 11.4.3. Let / be a bounded interval, let / : I — > R be a Riemann 
integrable function, and let P be a partition of I. Show that 

//=£//■ 

JI Je p jj 

Exercise 11.4.4. Without repeating all the computations in the above proofs, 
give a short explanation as to why the remaining cases of Theorem 11.4.3 
and Theorem 11.4.5 follow automatically from the cases presented in the text. 
(Hint: from Theorem 11.4.1 we know that if / is Riemann integrable, then so 

is -/•) 


11.5 Riemann integrability of continuous functions 

We have already said a lot about Riemann integrable functions so far, 
but we have not yet actually produced any such functions other than the 
piecewise constant ones. Now we rectify this by showing that a large 
class of useful functions are Riemann integrable. We begin with the 
uniformly continuous functions. 

Theorem 11.5.1. Let I be a bounded interval, and let f be a function 
which is uniformly continuous on I . Then f is Riemann integrable. 

Proof. From Proposition 9.9.15 we see that / is bounded. Now we have 
to show that f f = f l f . 

If I is a point or the empty set then the theorem is trivial, so let us 
assume that I is one of the four intervals [a, b], ( a,b ), (a, b], or [a, b) for 
some real numbers a < b. 
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Let e > 0 be arbitrary. By uniform continuity, there exists a 5 > 0 
such that | f(x) — f(y)\ < £ whenever x,y € I are such that \x — y\ < 6. 
By the Archimedean principle, there exists an integer N > 0 such that 
(b — a)/N < 5. 

Note that we can partition / into N intervals each of 

length (6 — a)/N . (How? One has to treat each of the cases [a, b\, (a, b ), 
(a, b\, [ a,b ) slightly differently.) By Proposition 11.3.12, we thus have 


^7 N 


[ f < X](sup f(x))\Jk\ 

J i k=i 


and 


so in particular 





- inf f{x))\Jk\- 


However, we have | f(x) — f{y)\ < £ for all x,y € Jk, since \.Jk\ = 
(b — a)/N < 5. In particular we have 


f(x) < f(y) + e for all x, y € J k - 


Taking suprema in x, we obtain 


sup f(x) < f(y) + £ for all y € J k , 


and then taking infima in y we obtain 


sup f(x) < inf f(y) + £. 

x eJfc y^Jk 


Inserting this bound into our previous inequality, we obtain 

N 

E 

—i k= l 
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but by Theorem 11.1.13 we thus have 

fjf-f f^ £ ( b ~ a )- 

But e > 0 was arbitrary, while ( b—a ) is fixed. Thus Jjf — Jjf cannot be 
positive. By Lemma 11.3.3 and the definition of Riemann integrability 
we thus have that / is Riemann integrable. □ 

Combining Theorem 11.5.1 with Theorem 9.9.16, we thus obtain 

Corollary 11.5.2. Let [ a,b\ be a closed interval, and let f : [a, b] — >• R 
be continuous. Then f is Riemann integrable. 

Note that this Corollary is not true if [a, b] is replaced by any other 
sort of interval, since it is not even guaranteed then that continuous 
functions are bounded. For instance, the function / : (0, 1) — >• R defined 
by f(x) := 1/x is continuous but not Riemann integrable. However, 
if we assume that a function is both continuous and bounded, we can 
recover Riemann integrability: 

Proposition 11.5.3. Let L be a bounded interval, and let f : L R be 
both continuous and bounded. Then f is Riemann integrable on I. 

Proof. If I is a point or an empty set then the claim is trivial; if I is a 
closed interval the claim follows from Corollary 11.5.2. So let us assume 
that I is of the form (a, b ], (a, 6), or [a, b ) for some a < b. 

We have a bound M for /, so that —M < f{x) < M for all x € I. 
Now let 0 < e < {b — a)/ 2 be a small number. The function / when 
restricted to the interval [a + e, b — e] is continuous, and hence Riemann 
integrable by Corollary 11.5.2. In particular, we can find a piecewise 
constant function h : [o + e, b — e\ — >• R which majorizes / on [a + e, b — e\ 
such that 

[ h< [ f + e. 

J[a+s,b— e] J[a+E,b— e] 

Define h : I — » R by 

h{x) := | 


h{x) if x G [a + e, b — e\ 

M if x € I\[a + e, b — e] 
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Clearly h is piecewise constant on I and majorizes /; by Theorem 11.2.16 
we have 


J h = eM + 
In particular we have 


h + eM < 


[a+e,b— e] 


/ 

J\c 


[a+e,b— e] 


/ + (2 M + l)e. 


[ f< f / + (2M + l)e. 
J I J [a-\-£,b—e] 


A similar argument gives 



/> / 

J [a+e,b— e] 


f - (2M + 1) £ 


and hence 

J/- J f < ( 4M + 2 ) £ - 

But £ is arbitrary, and so we can argue as in the proof of Theorem 11.5.1 
to conclude Riemann integrability. □ 


This gives a large class of Riemann integrable functions already; the 
bounded continuous functions. But we can expand this class a little 
more, to include the bounded piecewise continuous functions. 

Definition 11.5.4. Let I be a bounded interval, and let / : I R. 
We say that / is piecewise continuous on I iff there exists a partition P 
of I such that f\j is continuous on J for all J € P. 

Example 11.5.5. The function / : [1,3] — >• R defined by 

( x 2 if 1 < x < 2 
F(x) := < 7 if x = 2 

! x 3 if 2 < x < 3 


is not continuous on [1,3], but it is piecewise continuous on [1,3] (since 
it is continuous when restricted to [1, 2) or {2} or (2, 3], and those three 
intervals partition [1,3]). 

Proposition 11.5.6. Let I he a bounded interval, and let f : I — >• R be 
both piecewise continuous and bounded. Then f is Riemann integrable. 


Proof. See Exercise 11.5.1. 


□ 
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— Exercises — 

Exercise 11.5.1. Prove Proposition 11.5.6. (Hint: use Theorem 11.4.1(a) and 

(h)-) 


11.6 Riemann integrability of monotone functions 


In addition to piecewise continuous functions, another wide class of func- 
tions is Riemann integrable, namely the monotone functions. We give 
two instances of this: 


Proposition 11.6.1. Let [a,b] be a closed and bounded interval and let 
f : [a,b] R be a monotone function. Then f is Riemann integrable 
on [a, b] . 

Remark 11.6.2. From Exercise 9.8.5 we know that there exist mono- 
tone functions which are not piecewise continuous, so this proposition is 
not subsumed by Proposition 11.5.6. 

Proof. Without loss of generality we may take / to be monotone increas- 
ing (instead of monotone decreasing). From Exercise 9.8.1 we know that 
/ is bounded. Now let N > 0 be an integer, and partition [a, b] into N 
half-open intervals {[a + a + h ff l (j + 1)) : 0 < j < N — 1} of length 
( b — a)/N , together with the point {6}. Then by Proposition 11.3.12 we 
have 



(the point {6} clearly giving only a zero contribution). Since / is mono- 
tone increasing, we thus have 



Similarly we have 


Thus we have 







b — a 
N 


a + 1 )) 


/ 



b — a 


b — a 


N 


N 
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Using telescoping series (Lemma 7.2.15) we thus have 

„ f , f h — a . „ A „ / b — a \\ b — a 

) -jr- 

= (/(*) - /(»))— jyL 

But IV was arbitrary, so we can conclude as in the proof of Theorem 
11.5.1 that / is Riemann integrable. □ 

Corollary 11.6.3. Let I be a bounded interval, and let f : I R be 
both monotone and bounded. Then f is Riemann integrable on I . 

Proof. See Exercise 11.6.1. □ 

We now give the famous integral test for determining convergence of 

monotone decreasing series. 

Proposition 11.6.4 (Integral test). Let f : [0,oo) — >• R be a monotone 
decreasing function which is non-negative ( i.e ., f(x) > 0 for all x > 0). 
Then the sum o f( n ) convergent if and only if s\xp N>0 Jj 0 N , f is 
finite. 

Proof. See Exercise 11.6.3. □ 

Corollary 11.6.5. Let p be a real number. Then converges 

absolutely when p > 1 and diverges when p < 1 . 

Proof. See Exercise 11.6.5. □ 



— Exercises — 

Exercise 11.6.1. Use Proposition 11.6.1 to prove Corollary 11.6.3. (Hint: adapt 
the proof of Proposition 11.5.3.) 

Exercise 11.6.2. Formulate a reasonable notion of a piecewise monotone func- 
tion, and then show that all bounded piecewise monotone functions are Rie- 
mann integrable. 

Exercise 11.6.3. Prove Proposition 11.6.4. (Hint: what is the relationship 
between the sum ]T)n=i /( n )> the sum J0 ra _ o 1 f(n), and the integral Jj 0 /?) 
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Exercise 11.6.4. Give examples to show that both directions of the integral test 
break down if / is not assumed to be monotone decreasing. 

Exercise 11.6.5. Use Proposition 11.6.4 to prove Corollary 11.6.5. 

11.7 A non-Riemann integrable function 

We have shown that there are large classes of bounded functions which 
are Riemann integrable. Unfortunately, there do exist bounded func- 
tions which are not Riemann integrable: 

Proposition 11.7.1. Let f : [0, 1] — >• R be the discontinuous function 

f(r \ .= I 1 e Q 

Jy J ■ \ 0 if x(£ Q 

considered in Example 9.3.21. Then f is bounded but not Riemann 
integrable. 

Proof. It is clear that / is bounded, so let us show that it is not Riemann 
integrable. 

Let P be any partition of [0, 1]. For any J € P, observe that if J is 
not a point or the empty set, then 

sup f{x) = 1 

x£j 

(by Proposition 5.4.14). In particular we have 

(sup f{x) \ \J\ = |J|. 

\x£j J 

(Note this is also true when J is a point, since both sides are zero.) In 
particular we see that 

u(f, p)= ]T | J| = I [o, i] | = i 

JeP:J^0 

by Theorem 11.1.13; note that the empty set does not contribute any- 
thing to the total length. In particular we have J\ 0 ^f = 1, by Proposi- 
tion 11.3.12. 

A similar argument gives that 

inf f[x) = 0 

X&J 
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for all J (other than points or the empty set), and so 

L(f, p)= £ 0 = 0 - 

JeP:J^0 

In particular we have f ^ / = 0, by Proposition 11.3.12. Thus the 
upper and lower Riemann integrals do not match, and so this function 
is not Riemann integrable. □ 

Remark 11.7.2. As you can see, it is only rather “artificial” bounded 
functions which are not Riemann integrable. Because of this, the Rie- 
mann integral is good enough for a large majority of cases. There are 
ways to generalize or improve this integral, though. One of these is the 
Lebesgue integral, which we will define in Chapter 11.45. Another is 
the Riemann- Stieltjes integral fda, where a : I — > R is a monotone 
increasing function, which we define in the next section. 

11.8 The Riemann-Stieltjes integral 

Let I be a bounded interval, let a : I — > R be a monotone increasing 
function, and let / : I — >• R be a function. Then there is a generalization 
of the Riemann integral, known as the Riemann-Stieltjes integral. This 
integral is defined just like the Riemann integral, but with one twist: 
instead of taking the length \J\ of intervals J, we take the a-length 
a[J], defined as follows. If J is a point or the empty set, then a[J] := 0. 
If J is an interval of the form [a, b], ( a,b ), (a, 6], or [a, b), then a[J] := 
a(b) — a(a). Note that in the special case where a is the identity function 
a(x) := x, then a[J] is just the same as | J\. However, for more general 
monotone functions a, the a-length a[J] is a different quantity from \J\. 
Nevertheless, it turns out one can still do much of the above theory, but 
replacing \J\ by a[J ] throughout. 

Definition 11.8.1 (a-length). Let I be a bounded interval, and let 
a : X — >• R be a function defined on some domain X which contains /. 
Then we define the a-length a[I] of I as follows. If I is a point or the 
empty set, we set a[I] =0. If I is an interval of the form [a, b\, [ a,b ), 
(a, b\, or (a, b ) for some b > a, then we set a[I] = a(b) — a(a). 

Example 11.8.2. Let a : R -> R be the function a{x) : = x 2 . Then 
a[[2, 3]] = a(3) — a(2) = 9 — 4 = 5, while a[(— 3, —2)] = —5. Meanwhile 
a[{2}] = 0 and a[0] = 0. 



11.8. The Riemann-Stieltjes integral 


293 


Example 11.8.3. Let a : R — >• R be the identity function a(x) := x. 
Then a[I] = |/| for all bounded intervals I (why?) Thus the notion of 
length is a special case of the notion of a-length. 

We sometimes write a\ b a or a{x)\%z b a instead of a[[a, &]]. 

One of the key theorems for the theory of the Riemann integral 
was Theorem 11.1.13, which concerned length and partitions, and in 
particular showed that |/| = p 1^1 whenever P was a partition of /. 
We now generalize this slightly. 

Lemma 11.8.4. Let I be a bounded interval, let a : X R be a 
function defined on some domain X which contains I, and let P be a 
partition of I . Then we have 

a[I] = “[ J ]- 

JeP 


Proof. See Exercise 11.8.1. □ 

We can now define a generalization of Definition 11.2.9. 

Definition 11.8.5 (P.c. Riemann-Stieltjes integral). Let I be a 
bounded interval, and let P be a partition of I. Let a : X — >• R be a 
function defined on some domain X which contains /, and let /:/—>• R 
be a function which is piecewise constant with respect to P. Then we 
define 

p.c. / / da := \ cjot[J] 

V] l^v 

where cj is the constant value of / on J . 

Example 11.8.6. Let / : [1,3] —> R be the function 

j., N (4 when x G [1, 2) 

/(X)= \2 when * € [2,3], 

let a : R — >• R be the function a(x) := x 2 , and let P be the partition 
P := {[1,2), [2,3]}. Then 

p.c. / f da = cm 2 )a[[l, 2)] + cr 2)3 ]a[[2, 3]] 

P] 

= 4(a(2) - a(l)) + 2(a(3) - a(2)) = 4x3 + 2x5 = 22. 
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Example 11.8.7. Let a : R — » R be the identity function a(x) := x. 
Then for any bounded interval I, any partition P of I, and any function 
/ that is piecewise constant with respect to P, we have p.c. Jj p j f da = 
p.c. Jj p] / (why?). 

We can obtain an exact analogue of Proposition 11.2.13 by replacing 
all the integrals p.c. jj p i / in the proposition with p.c. Jj p j / da (Exer- 
cise 11.8.2). We can thus define p.c. [j f da for any piecewise constant 
function /:/—>• R and any a : X — >• R defined on a domain containing 
I, in analogy to before, by the formula 

p.c. / f da := p.c. / f da 
Ji J[ P] 

for any partition P on / with respect to which / is piecewise constant. 

Up until now, our function a : R — > R could have been arbitrary. 
Let us now assume that a is monotone increasing , i.e., a(y) > a(x) 
whenever x, y € X are such that y > x. This implies that a(I) > 0 
for all intervals in X (why?). From this one can easily verify that all 
the results from Theorem 11.2.16 continue to hold when the integrals 
p.c. jj f are replaced by p.c. f r f da, and the lengths |/| are replaced by 
the a-lengths a(I); see Exercise 11.8.3. 

We can then define upper and lower Riemann-Stieltjes integrals 
f jf da and f f da whenever / : I — > R is bounded and a is defined on 
a domain containing I, by the usual formulae 


/ da := infjp.c. / g da : g is p.c. on I and majorizes /} 

i Ji 


g da : g is p.c. on I and minor izes /}. 

We then say that / is Riemann-Stieltjes integrable on I with respect to a 
if the upper and lower Riemann-Stieltjes integrals match, in which case 
we set 



As before, when a is the identity function a(x) := x then the 
Riemann-Stieltjes integral is identical to the Riemann integral; thus the 
Riemann-Stieltjes integral is a generalization of the Riemann integral. 


f da := supjp.c. 
r h 
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(We shall see another comparison between the two integrals a little later, 
in Corollary 11.10.3.) Because of this, we sometimes write fj f as fjf dx 
or jj f(x ) dx. 

Most (but not all) of the remaining theory of the Riemann integral 
then can be carried over without difficulty, replacing Riemann integrals 
with Riemann-Stieltjes integrals and lengths with a-lengths. There are a 
couple results which break down; Theorem 11.4.1(g), Proposition 11.5.3, 
and Proposition 11.5.6 are not necessarily true when a is discontinuous 
at key places (e.g., if / and a are both discontinuous at the same point, 
then fj f da is unlikely to be defined. However, Theorem 11.5.1 is still 
true (Exercise 11.8.4). 


— Exercises — 

Exercise 11.8.1. Prove Lemma 11.8.4. (Hint: modify the proof of Theorem 
11.1.13.) 

Exercise 11.8.2. State and prove a version of Proposition 11.2.13 for the 
Riemann-Stieltjes integral. 

Exercise 11.8.3. State and prove a version of Theorem 11.2.16 for the Riemann- 
Stieltjes integral. 

Exercise 11.8.4. State and prove a version of Theorem 11.5.1 for the Riemann- 
Stieltjes integral. (Hint: one has to be careful with the proof; the problem here 
is that some of the references to the length of \Jk\ should remain unchanged, 
and other references to the length of | Jk\ should be changed to the a-length 
a(Jfc) - basically, all of the occurrences of | Jk\ which appear inside a summation 
should be replaced with a(Jfc), but the rest should be unchanged.) 

Exercise 11.8.5. Let sgn : R — > R be the signum function 

{ 1 when x > 0 

0 when x = 0 

— 1 when x < 0. 

Let / : [-1,1] -> R be a continuous function. Show that / is Riemann-Stieltjes 
integrable with respect to sgn, and that 



/ dsgn = 2/(0). 


(Hint: for every e > 0, find piecewise constant functions majorizing and rni- 
norizing / whose Riemann-Stieltjes integral is e-close to 2/(0).) 


11.9 The two fundamental theorems of calculus 

We now have enough machinery to connect integration and differentia- 
tion via the familiar fundamental theorem of calculus. Actually, there 
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are two such theorems, one involving the derivative of the integral, and 
the other involving the integral of the derivative. 

Theorem 11.9.1 (First Fundamental Theorem of Calculus). Let a < b 
be real numbers, and let f : [a, b] — >• R be a Riemann integrable function. 
Let F : [a, b] — > R be the function 

F{x) := [ f. 

J [a,x] 


Then F is continuous. Furthermore, if xq € [a, b] and f is continuous 
at xq, then F is differentiable at xq, and F'(x o) = f(x o). 

Proof. Since / is Riemann integrable, it is bounded (by Definition 
11.3.4). Thus we have some real number M such that — M < f(x) < M 
for all x € [a, b\. 

Now let x < y be two elements of [a, b\. Then notice that 
F(y)-F(x)= [ f- [ f= f f 

d[a,y] J[a,x } d[x,y\ 

by Theorem 11.4.1(h). By Theorem 11.4.1(e) we thus have 


/ < / M = p.c. / M = M (■ y — x ) 

'[x,y\ J[x,y] J[x,y] 


and 


and thus 


/ > / — M = p.c. / — M = —M{y — x) 


l[x,y\ J[x,y] 


[x,y] 


\F(y)-F(x)\<M(y-x). 

This is for y > x. By interchanging x and y we thus see that 


I F(y) - F(x)\ <M(x- y) 

when x > y. Also, we have F(y) — F{x) = 0 when x = y. Thus in all 
three cases we have 


I F{y) -T(.t)| < M\x — y\. 

Now let x € [a, b], and let (x n )^ =0 be any sequence in [a, b] converging 
to x. Then we have 

—M\x n — x\< F(x n ) — F(x) < M\x n — x\ 
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for each n. But — M\x n — x\ and M\x n — x\ both converge to 0 as n -G oo, 
so by the squeeze test F(x n ) — F(x) converges to 0 as n — >• oo, and thus 
lim n _ ) . 00 F(x n ) = F(x). Since this is true for all sequences x n G [a, b] 
converging to x, we thus see that F is continuous at x. Since x was an 
arbitrary element of [a,b], we thus see that F is continuous. 

Now suppose that xo G [a, b], and / is continuous at xq. Choose any 
e > 0. Then by continuity, we can find a <5 > 0 such that |/(x) — /(xo)| < 
e for all x in the interval I := \xq — <5, Xo + 5] fl [a, b\, or in other words 

/Oo) - £ < /O) < /Oo) + ^ for all x G /. 

We now show that 

I F(y) - F(x 0 ) - f(x 0 )(y - x 0 )| < e\y - x 0 | 

for all y G /, since Proposition 10.1.7 will then imply that F is differen- 
tiable at xo with derivative F' Oo) = / Oo) as desired. 

Now fix y G I. There are three cases. If y = xo, then F(y) — F(x o) — 
f( x o)(y ~ x o) = 0 and so the claim is obvious. If y > xq, then 

F(y) - FOo) = [ /• 

J[xo,y\ 

Since xo ,y G I , and I is a connected set, then [xo,y] is a subset of I, 
and thus we have 

/Oo) - £ < / 0) < /Oo) + £ for all x G [x 0 ,y], 

and thus 

(/Oo) - e){y ~ x 0 ) < [ f < (/Oo) +£){y ~ xo) 

d[xo,y] 

and so in particular 

I F(y) - F(x o) - f{xo){y - x 0 )| < e\y - x 0 | 

as desired. The case y < x o is similar and is left to the reader. □ 

Example 11.9.2. Recall in Exercise 9.8.5 that we constructed a mono- 
tone function / : R — * R which was discontinuous at every rational 
and continuous everywhere else. By Proposition 11.6.1, this monotone 
function is Riemann integrable on [0, 1]. If we define F : [0, 1] — >• R by 
F(x) := fo x j /, then F is a continuous function which is differentiable 
at every irrational number. On the other hand, F is non-differentiable 
at every rational number; see Exercise 11.9.1. 
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Informally, the first fundamental theorem of calculus asserts that 

( [ f] ( x ) = f( x ) 

given a certain number of assumptions on /. Roughly, this means that 
the derivative of an integral recovers the original function. Now we 
show the reverse, that the integral of a derivative recovers the original 
function. 

Definition 11.9.3 (Antiderivatives). Let I be a bounded interval, and 
let / : I — > R be a function. We say that a function F : I — >• R is an 
antiderivative of / if F is differentiable on I and F'(x) = f(x) for all 
x € I. 

Theorem 11.9.4 (Second Fundamental Theorem of Calculus). Let 
a < b be real numbers, and let f : [a, b\ — >• R be a Riemann integrable 
function. If F : [a, b] — >• R is an antiderivative of f , then 

[ f = F(b) — F(a). 

J[a,b] 

Proof. We will use Riemann sums. The idea is to show that 
U(f,P) > F(b) — F (a) > L(f, P ) 

for every partition P of [a, b\. The left inequality asserts that F(b) — F(a ) 
is a lower bound for {U(f, P) : P is a partition of [a, 6]}, while the right 
inequality asserts that F(b) — F(a) is an upper bound for {L(f, P) : P 
is a partition of [a,b]}. But by Proposition 11.3.12, this means that 

/ f> F(b) - F(a) > [ /, 

J M I—[a,b\ 

but since / is assumed to be Riemann integrable, both the upper and 
lower Riemann integral equal f, ^ /. The claim follows. 

We have to show the bound U(f,P) > F(b) — F{a) > L(f, P). We 
shall just show the first inequality U(f, P) > F(b) — F(a ); the other 
inequality is similar. 

Let P be a partition of [a, b\. From Lemma 11.8.4 we have 

F(b)-F(a) = J2 F iJ}= E 

JeP JeP:J^0 
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while from definition we have 


U(f,P) = ^2 sup/(s)|J|. 

JeP:J^0 x£j 

Thus it will suffice to show that 

F[J] < sup/(x)|J| 

x&J 


for all J £ P (other than the empty set). 

When J is a point then the claim is clear, since both sides are zero. 
Now suppose that J = [c, d\, (c, d], [c, d), or (c, d) for some c < d. Then 
the left-hand side is F[J] = F(d) — F(c). By the mean-value theorem, 
this is equal to (d — c)F\e ) for some e € J ■ But since F'(e) = /(e), we 
thus have 

F[J\ = ( d-c)f(e ) = /(e) | J\ < sup/(x)|J| 

x£j 

as desired. □ 

Of course, as you are all aware, one can use the second fundamental 
theorem of calculus to compute integrals relatively easily provided that 
you can find an anti-derivative of the integrand /. Note that the first 
fundamental theorem of calculus ensures that every continuous Riemann 
integrable function has an anti- derivative. For discontinuous functions, 
the situation is more complicated, and is a graduate-level real analysis 
topic which will not be discussed here. Also, not every function with 
an anti-derivative is Riemann integrable; as an example, consider the 
function F : [—1, 1] — >• R defined by F(x) := x 2 sin(l/x 3 ) when i/O, 
and F( 0) := 0. Then F is differentiable everywhere (why?), so F' has 
an antiderivative, but F' is unbounded (why?), and so is not Riemann 
integrable. 

We now pause to mention the infamous “+C'” ambiguity in anti- 
derivatives: 

Lemma 11.9.5. Let I be a bounded interval, and let f : I R be a 
function. Let F : L ^ R and G : I R be two antiderivatives of f . 
Then there exists a real number C such that F(x) = G(x ) + C for all 

x € /. 


Proof. See Exercise 11.9.2. 


□ 
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— Exercises — 

Exercise 11.9.1. Let / : [0, 1] — > R be the function in Exercise 9.8.5. Show that 
for every rational number q £ Q fl [0, 1], the function F : [0, 1] — » R defined by 
the formula F(x ) := f* f(y) dy is not differentiable at q. 

Exercise 11.9.2. Prove Lemma 11.9.5. (Hint: apply the mean-value theorem, 
Corollary 10.2.9, to the function F — G. One can also prove this lemma using 
the second Fundamental theorem of calculus (how?), but one has to be careful 
since we do not assume / to be Riemann integrable.) 

Exercise 11.9.3. Let a < b be real numbers, and let / : [a, b] — > R be a 
monotone increasing function. Let F : [a, b] — > R be the function F(x) := 
Jj a /. Let Xq be an element of [a, b\. Show that F is differentiable at Xo if 
and only if / is continuous at Xo- (Hint: one direction is taken care of by one 
of the fundamental theorems of calculus. For the other, consider left and right 
limits of / and argue by contradiction.) 


11.10 Consequences of the fundamental theorems 


We can now give a number of useful consequences of the fundamental 
theorems of calculus (beyond the obvious application, that one can now 
compute any integral for which an anti-derivative is known). The first 
application is the familiar integration by parts formula. 

Proposition 11.10.1 (Integration by parts formula). Let I = [a, h], 
and let F : [a, b] — >• R and G : [a, b] — >• R be differentiable functions on 
[a, b] such that F' and G' are Riemann integrable on I . Then we have 



F(b)G{b) - F(a)G(a) 



Proof. See Exercise 11.10.1. 


□ 


Next, we show that under certain circumstances, one can write a 
Riemann-Stieltjes integral as a Riemann integral. We begin with piece- 
wise constant functions. 


Theorem 11.10.2. Let a : [a, b] — >• R be a monotone increasing func- 
tion, and suppose that a is also differentiable on [a, b\, with a! being 
Riemann integrable. Let f : [a, b\ — »• R be a piecewise constant function 
on [a, b]. Then fa' is Riemann integrable on [a, b], and 
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Proof. Since / is piecewise constant, it is Riemann integrable, and since 
of is also Riemann integrable, then fa' is Riemann integrable by Theo- 
rem 11.4.5. 

Suppose that / is piecewise constant with respect to some partition 
P of [a, b\; without loss of generality we may assume that P does not 
contain the empty set. Then we have 



da = p.c. 


f da = ^2 cja[J } 
JeP 


where cj is the constant value of / on J. On the other hand, from 
Theorem 11.2.16(h) (generalized to partitions of arbitrary length - why 
is this generalization true?) we have 


( f a ' = f f a ' = [ cja> = cj \ a ' 

J[ a M JeP ' ^ Je P^ J JeP 


But by the second fundamental theorem of calculus (Theorem 11.9.4), 
/jO.' = a[J], and the claim follows. □ 

Corollary 11.10.3. Let a : [a, b] — > R he a monotone increasing func- 
tion, and suppose that a is also differentiable on [a, b\, with a! being 
Riemann integrable. Let f : [a, b\ — > R be a function which is Riemann- 
Stieltjes integrable with respect to a on [ a,b ]. Then fa' is Riemann 
integrable on [a,b\, and 




Proof. Note that since / and a' are bounded, then fa' must also be 
bounded. Also, since a is monotone increasing and differentable, a! is 
non-negative. 

Let e > 0. Then, we can find a piecewise constant function / ma- 
jorizing / on [a, 6], and a piecewise constant function / minorizing / on 
[a, b ] , such that 



/ da — e < 



f da < 



f da < 



f da + e. 


Applying Theorem 11.10.2, we obtain 
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Since a' is non- negative and / minorizes /, then fa' minorizes fa' . 
Thus f, fa' < f Jo! (why?). Thus 

■T[a,b\ J - - T[a,b] j K J ’ 



f da — £ < 



Similarly we have 




f da + e. 


Since these statements are true for any £ > 0 , we must have 



f da < 



and the claim follows. 


f da 


□ 


Remark 11.10.4. Informally, Corollary 11 . 10.3 asserts that / da is 
essentially equivalent to fdjdx, when a is differentiable. However, the 
advantage of the Riemann-Stieltjes integral is that it still makes sense 
even when a is not differentiable. 


We now build up to the familiar change of variables formula. We 
first need a preliminary lemma. 


Lemma 11.10.5 (Change of variables formula I). Let [ a,b ] be a closed 
interval, and let cj) : [a,b] [0(a), 0(6)] be a continuous monotone in- 

creasing function. Let f : [0(a), 0(6)] — >• R be a piecewise constant func- 
tion on [0(a), 4>(b)\. Then f o 0 : [a, b] — >• R is also piecewise constant 
on [a, b], and 


f o 0 d(j) = 


a,b\ 


f • 


[<t>(a)Mb) ] 


Proof. We give a sketch of the proof, leaving the gaps to be filled in 
Exercise 11 . 10 . 2 . Let P be a partition of [0(a), 0(6)] such that / is 
piecewise constant with respect to P; we may assume that P does not 
contain the empty set. For each J € P, let cj be the constant value of 
/ on J, thus 


[<j>(a),4>(b)\ 


/ = E c ^i J i- 

JeP 


For each interval J, let 0^ 1 (J) be the set 0 -1 (J) := {x € [a, b] : 0(.x) € 
J}. Then 0 -1 (J) is connected (why?), and is thus an interval. Further- 
more, cj is the constant value of / o 0 on 0“ 1 (J) (why?). Thus, if we 
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define Q := {f~ 1 (J) : J 6 P} (ignoring the fact that Q has been used 
to represent the rational numbers), then Q partitions [a,b\ (why?), and 
/ o f is piecewise constant with respect to Q (why?). Thus 

[ f o f df = I f o f df = V' cjf[f~ l (J)}. 

J[a,b\ ^ [Q] JGP 

But c/>[0 _1 (J)] = | J\ (why?), and the claim follows. □ 

Proposition 11.10.6 (Change of variables formula II). Let [a,b] be a 
closed interval, and let f : [a, b] — >• [</>(a), f(b)\ be a continuous monotone 
increasing function. Let f : [f(a), f(b)\ — >■ R be a Riemann integrable 
function on [f(a),f(b)]. Then f o f : [a, b\ — >• R is Riemann- Stieltjes 
integrable with respect to f> on [a,b\, and 



f of # 


'[<t>( a ),<P( b )\ 


f • 


Proof. This will be obtained from Lemma 11.10.5 in a similar manner 
to how Corollary 11.10.3 was obtained from Theorem 11.10.2. First 
observe that since / is Riemann integrable, it is bounded, and then f of 
must also be bounded (why?). 

Let e > 0. Then, we can find a piecewise constant function / ma- 
jorizing / on [f(a), f(b)\, and a piecewise constant function / minorizing 
/ on [f(a), <j>(b)], such that 


f f~e< 


l L ~ 


f f< 


'[<£(«) > <K ft )] 


/ + £. 


Applying Lemma 11.10.5, we obtain 


f f~£< 



f o (f> df < 



f o f d(f> < 


'[0(a) ></>(&)] 


/ + £• 


Since / o f is piecewise constant and minorizes / of, we have 



/ o f df < 


[ f °f df 

■J—[a,b] 


while similarly we have 



/ o f df > 


[ f of df. 
J [a, b\ 
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Thus 


'[0(a), 0(6)] 


/-£</ fo(j)d(j)< f 0(/) (1(f) < 


■J— [a, 6] 


[a, 6] 


'[0(a),0(&)[ 


f + £■ 


Since e > 0 was arbitrary, this implies that 


/ f ~ f 

J[<t>(a),<f>(b)] J—[a,b] 

and the claim follows. 


f o (j) d(j) < / f ° 4> d(f> < 
J [a,b] 


'[0(a), 0(b)] 


□ 


Combining this formula with Corollary 11.10.3, one immediately ob- 
tains the following familiar formula: 


Proposition 11.10.7 (Change of variables formula III). Let [ a,b ] be 
a closed interval, and let f '■ [a,b\ — > [(j>(a), 4>(b)} be a differentiable 
monotone increasing function such that ft is Riemann integrable. Let 
f ■ [(f(a),<f>(b)] ->• R be a Riemann integrable function on [4>(a), (j>{b)\. 
Then (/ o ft) ft : [a, b] — »• R is Riemann integrable on [a, b\, and 


[a, 6] 


(/ ° 4 >) 4 > = 


'[0(a), 0(6)] 


— Exercises — 

Exercise 11.10.1. Prove Proposition 11.10.1. (Hint: first use Corollary 11.5.2 
and Theorem 11.4.5 to show that FG' and F'G are Riemann integrable. Then 
use the product rule (Theorem 10.1.13(d)).) 

Exercise 11.10.2. Fill in the gaps marked (why?) in the proof of Lemma 11.10.5. 

Exercise 11.10.3. Let a < b be real numbers, and let / : [a, b] — > R be a 
Riemann integrable function. Let g : [—b, — a ] — > R be defined by g(x) := 
f(—x). Show that g is also Riemann integrable, and Jj_ b _ a j g = Jj a fc j /. 

Exercise 11.10.4. What is the analogue of Proposition 11.10.7 when <f> is mono- 
tone decreasing instead of monotone increasing? (When (f> is neither monotone 
increasing or monotone decreasing, the situation becomes significantly more 
complicated.) 
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Appendix: the basics of mathematical logic 


The purpose of this appendix is to give a quick introduction to mathe- 
matical logic , which is the language one uses to conduct rigorous math- 
ematical proofs. Knowing how mathematical logic works is also very 
helpful for understanding the mathematical way of thinking, which once 
mastered allows you to approach mathematical concepts and problems 
in a clear and confident way - including many of the proof- type questions 
in this text. 

Writing logically is a very useful skill. It is somewhat related to, 
but not the same as, writing clearly, or efficiently, or convincingly, or 
informatively; ideally one would want to do all of these at once, but 
sometimes one has to make compromises, though with practice you’ll 
be able to achieve more of your writing objectives concurrently. Thus a 
logical argument may sometimes look unwieldy, excessively complicated, 
or otherwise appear unconvincing. The big advantage of writing logi- 
cally, however, is that one can be absolutely sure that your conclusion 
will be correct, as long as all your hypotheses were correct and your 
steps were logical; using other styles of writing one can be reasonably 
convinced that something is true, but there is a difference between being 
convinced and being sure. 

Being logical is not the only desirable trait in writing, and in fact 
sometimes it gets in the way; mathematicians for instance often resort 
to short informal arguments which are not logically rigorous when they 
want to convince other mathematicians of a statement without going 
through all of the long details, and the same is true of course for non- 
mathematicians as well. So saying that a statement or argument is “not 
logical” is not necessarily a bad thing; there are often many situations 
when one has good reasons not to be emphatic about being logical. How- 
ever, one should be aware of the distinction between logical reasoning 
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and more informal means of argument, and not try to pass off an illogi- 
cal argument as being logically rigorous. In particular, if an exercise is 
asking for a proof, then it is expecting you to be logical in your answer. 

Logic is a skill that needs to be learnt like any other, but this skill 
is also innate to all of you - indeed, you probably use the laws of logic 
unconsciously in your everyday speech and in your own internal (non- 
mathematical) reasoning. However, it does take a bit of training and 
practice to recognize this innate skill and to apply it to abstract situa- 
tions such as those encountered in mathematical proofs. Because logic 
is innate, the laws of logic that you learn should make sense - if you find 
yourself having to memorize one of the principles or laws of logic here, 
without feeling a mental “click” or comprehending why that law should 
work, then you will probably not be able to use that law of logic correctly 
and effectively in practice. So, please don’t study this appendix the way 
you might cram before a final - that is going to be useless. Instead, put 
away your highlighter pen, and read and understand this appendix 
rather than merely studying it! 

A.l Mathematical statements 

Any mathematical argument proceeds in a sequence of mathematical 
statements. These are precise statements concerning various mathemat- 
ical objects (numbers, vectors, functions, etc.) and relations between 
them (addition, equality, differentiation, etc.). These objects can either 
be constants or variables; more on this later. Statements 1 are either 
true or false. 

Example A. 1.1. 2 + 2 = 4 is a true statement; 2 + 2 = 5 is a false 
statement. 

Not every combination of mathematical symbols is a statement. For 
instance, 

= 2 + +4 = — = 2 

is not a statement; we sometimes call it ill-formed or ill-defined. The 
statements in the previous example are well-formed or well-defined. 
Thus well-formed statements can be either true or false; ill-formed state- 
ments are considered to be neither true nor false (in fact, they are usu- 
ally not considered statements at all). A more subtle example of an 

1 More precisely, statements with no free variables are either true or false. We shall 
discuss free variables later on in this appendix. 
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ill-formed statement is 

0/0 = 1 ; 

division by zero is undefined, and so the above statement is ill-formed. 
A logical argument should not contain any ill-formed statements, thus 
for instance if an argument uses a statement such as x/y = z, it needs to 
first ensure that y is not equal to zero. Many purported proofs of “0=1” 
or other false statements rely on overlooking this “statements must be 
well-formed” criterion. 

Many of you have probably written ill-formed or otherwise inaccurate 
statements in your mathematical work, while intending to mean some 
other, well-formed and accurate statement. To a certain extent this is 
permissible - it is similar to misspelling some words in a sentence, or 
using a slightly inaccurate or ungrammatical word in place of a correct 
one (“She ran good” instead of “She ran well”). In many cases, the 
reader (or grader) can detect this mis-step and correct for it. However, 
it looks unprofessional and suggests that you may not know what you 
are talking about. And if indeed you actually do not know what you are 
talking about, and are applying mathematical or logical rules blindly, 
then writing an ill-formed statement can quickly confuse you into writing 
more and more nonsense - usually of the sort which receives no credit in 
grading. So it is important, especially when just learning a subject, to 
take care in keeping statements well-formed and precise. Once you have 
more skill and confidence, of course you can afford once again to speak 
loosely, because you will know what you are doing and won’t be in as 
much danger of veering off into nonsense. 

One of the basic axioms of mathematical logic is that every well- 
formed statement is either true or false, but not both. (Though if there 
are free variables, the truth of a statement may depend on the values of 
these variables. More on this later.) Furthermore, the truth or falsity 
of a statement is intrinsic to the statement, and does not depend on the 
opinion of the person viewing the statement (as long as all the definitions 
and notations are agreed upon, of course). So to prove that a statement 
is true, it suffices to show that it is not false, while to show that a state- 
ment is false, it suffices to show that it is not true; this is the principle 
underlying the powerful technique of proof by contradiction , which we 
discuss later. This axiom is viable as long as one is working with pre- 
cise concepts, for which the truth or falsity can be determined (at least 
in principle) in an objective and consistent manner. However, if one is 
working in very non-mathematical situations, then this axiom becomes 
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much more dubious, and so it can be a mistake to apply mathematical 
logic to non-mathematical situations. (For instance, a statement such as 
“this rock weighs 52 pounds” is reasonably precise and objective, and so 
it is fairly safe to use mathematical reasoning to manipulate it, whereas 
vague statements such as “this rock is heavy”, “this piece of music is 
beautiful” or “God exists” are much more problematic. So while math- 
ematical logic is a very useful and powerful tool, it still does have some 
limitations of applicability.) One can still attempt to apply logic (or 
principles similar to logic) in these cases (for instance, by creating a 
mathematical model of a real-life phenomenon), but this is now science 
or philosophy, not mathematics, and we will not discuss it further here. 

Remark A. 1.2. There are other models of logic which attempt to deal 
with statements that are not definitely true or definitely false, such as 
modal logic, intuitionist logic, or fuzzy logic, but these are well beyond 
the scope of this text. 


Being true is different from being useful or efficient. For instance, 
the statement 


2 = 2 


is true but unlikely to be very useful. The statement 


4 < 4 


is also true, but not very efficient (the statement 4 = 4 is more precise). 
It may also be that a statement may be false yet still be useful, for 
instance 

v r = 22/7 

is false, but is still useful as a first approximation. In mathematical 
reasoning, we only concern ourselves with truth rather than usefulness 
or efficiency; the reason is that truth is objective (everybody can agree 
on it) and we can deduce true statements from precise rules, whereas 
usefulness and efficiency are to some extent matters of opinion, and do 
not follow precise rules. Also, even if some of the individual steps in 
an argument may not seem very useful or efficient, it is still possible 
(indeed, quite common) for the final conclusion to be quite non-trivial 
(i.e., not obviously true) and useful. 

Statements are different from expressions. Statements are true or 
false; expressions are a sequence of mathematical symbols which pro- 
duces some mathematical object (a number, matrix, function, set, etc.) 
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as its value. For instance 

2 + 3*5 

is an expression, not a statement; it produces a number as its value. 
Meanwhile, 

2 + 3*5 = 17 

is a statement, not an expression. Thus it does not make any sense to 
ask whether 2 + 3*5 is true or false. As with statements, expressions 
can be well-defined or ill-defined; 2 + 3/0, for instance, is ill-defined. 
More subtle examples of ill-defined expressions arise when, for instance, 
attempting to add a vector to a matrix, or evaluating a function outside 
of its domain, e.g., sin -1 (2). 

One can make statements out of expressions by using relations such 
as =, <, >, €, C, etc. or by using properties (such as “is prime”, “is 
continuous”, “is invertible”, etc.) For instance, “30 + 5 is prime” is a 
statement, as is “30 + 5 < 42 — 7”. Note that mathematical statements 
are allowed to contain English words. 

One can make a compound statement from more primitive statements 
by using logical connectives such as and, or, not, if-then, if-and-only-if, 
and so forth. We give some examples below, in decreasing order of 
intuitiveness. 

Conjunction. If A is a statement and Y is a statement, the statement 
“A and A” is true if X and Y are both true, and is false otherwise. 
For instance, “2 + 2 = 4 and 3 + 3 = 6” is true, while “2 + 2 = 4 and 
3 + 3 = 5” is not. Another example: “2 + 2 = 4 and 2 + 2 = 4” is true, 
even if it is a bit redundant; logic is concerned with truth, not efficiency. 

Due to the expressiveness of the English language, one can reword 
the statement “ X and Y” in many ways, e.g., “A and also A”, or “Both 
X and Y are true”, etc. Interestingly, the statement “A , but A” is 
logically the same statement as “A and Y” , but they have different 
connotations (both statements affirm that X and Y are both true, but 
the first version suggests that X and Y are in contrast to each other, 
while the second version suggests that X and Y support each other). 
Again, logic is about truth, not about connotations or suggestions. 

Disjunction. If A is a statement and A is a statement, the statement 
“A or A” is true if either A or A is true, or both. For instance, “2+2 = 4 
or 3 + 3 = 5” is true, but “2 + 2 = 5 or 3 + 3 = 5” is not. Also “2 + 2 = 4 
or 3 + 3 = 6” is true (even if it is a bit inefficient; it would be a stronger 
statement to say “2 + 2 = 4 and 3 + 3 = 6”). Thus by default, the word 
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“or” in mathematical logic defaults to inclusive or. The reason we do 
this is that with inclusive or, to verify “A or Y” , it suffices to verify that 
just one of X or Y is true; we don’t need to show that the other one is 
false. So we know, for instance, that “2+2 = 4 or 2353+5931 = 7284” is 
true without having to look at the second equation. As in the previous 
discussion, the statement “2 + 2 = 4 or 2 + 2 = 4” is true, even if it is 
highly inefficient. 

If one really does want to use exclusive or, use a statement such as 
“Either X or Y is true, but not both” or “Exactly one of X or Y is 
true”. Exclusive or does come up in mathematics, but nowhere near as 
often as inclusive or. 

Negation. The statement “A is not true” or “A is false”, or “It is not 
the case that A” , is called the negation of A, and is true if and only if A 
is false, and is false if and only if A is true. For instance, the statement 
“It is not the case that 2 + 2 = 5” is a true statement. Of course we 
could abbreviate this statement to “2 + 2 / 5” . 

Negations convert “and” into “or”. For instance, the negation of 
“Jane Doe has black hair and Jane Doe has blue eyes” is “Jane Doe 
doesn’t have black hair or doesn’t have blue eyes” , not “Jane Doe doesn’t 
have black hair and doesn’t have blue eyes” (can you see why?). Simi- 
larly, if x is an integer, the negation of “x is even and non-negative” is 
u x is odd or negative”, not “x is odd and negative”. (Note how it is im- 
portant here that or is inclusive rather than exclusive.) Or the negation 
of “x > 2 and x < 6” (i.e., “2 < x < 6”) is “x < 2 or x > 6”, not “x < 2 
and x > 6” or “2 < x > 6.” . 

Similarly, negations convert “or” into “and” . The negation of “John 
Doe has brown hair or black hair” is “John Doe does not have brown hair 
and does not have black hair”, or equivalently “John Doe has neither 
brown nor black hair” . If x is a real number, the negation of “x > 1 or 
x < —1” is “x < 1 and x > —1” (i.e., — 1 < x < 1). 

It is quite possible that a negation of a statement will produce a 
statement which could not possibly be true. For instance, if x is an 
integer, the negation of “x is either even or odd” is “x is neither even 
nor odd” , which cannot possibly be true. Remember, though, that even 
if a statement is false, it is still a statement, and it is definitely possible 
to arrive at a true statement using an argument which at times involves 
false statements. (Proofs by contradiction, for instance, fall into this 
category. Another example is proof by dividing into cases. If one divides 
into three mutually exclusive cases, Case 1, Case 2, and Case 3, then 
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at any given time two of the cases will be false and only one will be 
true, however this does not necessarily mean that the proof as a whole 
is incorrect or that the conclusion is false.) 

Negations are sometimes unintuitive to work with, especially if there 
are multiple negations; a statement such as “It is not the case that either 
x is not odd, or x is not larger than or equal to 3, but not both” is not 
particularly pleasant to use. Fortunately, one rarely has to work with 
more than one or two negations at a time, since often negations cancel 
each other. For instance, the negation of “A is not true” is just U X 
is true”, or more succinctly just “A”. Of course one should be careful 
when negating more complicated expressions because of the switching 
of “and” and “or”, and similar issues. 

If and only if (iff). If X is a statement, and Y is a statement, we say 
that “A is true if and only if Y is true”, whenever X is true, Y has 
to be also, and whenever Y is true, X has to be also (i.e., X and Y 
are “equally true”). Other ways of saying the same thing are “A and 

Y are logically equivalent statements” , or ll X is true iff Y is true” , or 
“A Y” . Thus for instance, if x is a real number, then the statement 
u x = 3 if and only if 2x = 6” is true: this means that whenever x = 3 
is true, then 2x = 6 is true, and whenever 2x = 6 is true, then x = 3 is 
true. On the other hand, the statement “x = 3 if and only if x 2 = 9” 
is false; while it is true that whenever x = 3 is true, x 2 = 9 is also 
true, it is not the case that whenever x 2 = 9 is true, that x = 3 is also 
automatically true (think of what happens when x = —3). 

Statements that are equally true, are also equally false: if X and 

Y are logically equivalent, and X is false, then Y has to be false also 
(because if Y were true, then X would also have to be true). Con- 
versely, any two statements which are equally false will also be logically 
equivalent. Thus for instance 2 + 2 = 5 if and only if 4 + 4 = 10. 

Sometimes it is of interest to show that more than two statements 
are logically equivalent; for instance, one might want to assert that three 
statements X, Y, and Z are all logically equivalent. This means when- 
ever one of the statements is true, then all of the statements are true; 
and it also means that if one of the statements is false, then all of the 
statements are false. This may seem like a lot of logical implications 
to prove, but in practice, once one demonstrates enough logical impli- 
cations between X, Y, and Z . one can often conclude all the others 
and conclude that they are all logicallly equivalent. See for instance 
Exercises A. 1.5, A. 1.6. 
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— Exercises — 

Exercise A. 1.1. What is the negation of the statement “either X is true, or Y 
is true, but not both”? 

Exercise A. 1.2. What is the negation of the statement “X is true if and only 
if Y is true” ? (There may be multiple ways to phrase this negation) . 

Exercise A. 1.3. Suppose that you have shown that whenever X is true, then Y 
is true, and whenever X is false, then Y is false. Have you now demonstrated 
that X and Y are logically equivalent? Explain. 

Exercise A. 1.4. Suppose that you have shown that whenever X is true, then Y 
is true, and whenever Y is false, then X is false. Have you now demonstrated 
that X is true if and only if Y is true? Explain. 

Exercise A. 1.5. Suppose you know that X is true if and only if Y is true, and 
you know that Y is true if and only if Z is true. Is this enough to show that 
X, Y, Z are all logically equivalent? Explain. 

Exercise A. 1.6. Suppose you know that whenever X is true, then Y is true; 
that whenever Y is true, then Z is true; and whenever Z is true, then X is 
true. Is this enough to show that X , Y, Z are all logically equivalent? Explain. 

A. 2 Implication 

Now we come to the least intuitive of the commonly used logical con- 
nectives - implication. If A is a statement, and Y is a statement, then 
“if X, then Y” is the implication from X to Y ; it is also written “when 
X is true, Y is true”, or “A implies Y ” or “Y is true when A is” or 
“A is true only if Y is true” (this last one takes a bit of mental effort to 
see). What this statement “if A, then Y” means depends on whether 
A is true or false. If A is true, then “if A, then Y v is true when Y is 
true, and false when Y is false. If however A is false, then “if A, then 
Y” is always true, regardless of whether Y is true or false! To put it 
another way, when A is true, the statement “if A, then Y ” implies that 
Y is true. But when A is false, the statement “if A, then Y ” offers no 
information about whether Y is true or not; the statement is true, but 
vacuous (i.e., does not convey any new information beyond the fact that 
the hypothesis is false). 

Examples A. 2.1. If x is an integer, then the statement “If x = 2, then 
x 2 = 4” is true, regardless of whether x is actually equal to 2 or not 
(though this statement is only likely to be useful when x is equal to 2). 
This statement does not assert that x is equal to 2, and does not assert 
that x 2 is equal to 4, but it does assert that when and if x is equal to 2, 
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then x 2 is equal to 4. If x is not equal to 2, the statement is still true 
but offers no conclusion on x or x 2 . 

Some special cases of the above implication: the implication “If 2 = 
2, then 2 2 = 4” is true (true implies true). The implication “If 3 = 2, 
then 3 2 = 4” is true (false implies false). The implication “If —2 = 2, 
then (— 2) 2 = 4” is true (false implies true). The latter two implications 
are considered vacuous - they do uot offer any new information since 
their hypothesis is false. (Nevertheless, it is still possible to employ 
vacuous implications to good effect in a proof - a vacously true statement 
is still true. We shall see one such example shortly.) 

As we see, the falsity of the hypothesis does not destroy the truth 
of an implication, in fact it is just the opposite! (When a hypothesis is 
false, the implication is automatically true.) The only way to disprove 
an implication is to show that the hypothesis is true while the conclusion 
is false. Thus “If 2 + 2 = 4, then 4 + 4 = 2” is a false implication. (True 
does not imply false.) 

One can also think of the statement “if X, then Y" as U Y is at least 
as true as X ” - if X is true, then Y also has to be true, but if X is 
false, Y could be as false as X , but it could also be true. This should 
be compared with U X if and only if Y” , which asserts that X and Y are 
equally true. 

Vacuously true implications are often used in ordinary speech, some- 
times without knowing that the implication is vacuous. A somewhat 
frivolous example is “If wishes were wings, then pigs would fly”. (The 
statement “hell freezes over” is also a popular choice for a false hypothe- 
sis.) A more serious one is “If John had left work at 5pm, then he would 
be here by now.” This kind of statement is often used in a situation in 
which the conclusion and hypothesis are both false; but the implication 
is still true regardless. This statement, by the way, can be used to il- 
lustrate the technique of proof by contradiction: if you believe that “If 
John had left work at 5pm, then he would be here by now”, and you 
also know that “John is not here by now”, then you can conclude that 
“John did not leave work at 5pm”, because John leaving work at 5pm 
would lead to a contradiction. Note how a vacuous implication can be 
used to derive a useful truth. 

To summarize, implications are sometimes vacuous, but this is not 
actually a problem in logic, since these implications are still true, and 
vacuous implications can still be useful in logical arguments. In particu- 
lar, one can safely use statements like “If X, then Y” without necessarily 
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having to worry about whether the hypothesis X is actually true or not 
(i.e., whether the implication is vacuous or not). 

Implications can also be true even when there is no causal link be- 
tween the hypothesis and conclusion. The statement “If 1 + 1 = 2, then 
Washington D.C. is the capital of the United States” is true (true im- 
plies true), although rather odd; the statement “If 2 + 2 = 3, then New 
York is the capital of the United States” is similarly true (false implies 
false). Of course, such a statement may be unstable (the capital of the 
United States may one day change, while 1 + 1 will always remain equal 
to 2) but it is true, at least for the moment. While it is possible to 
use acausal implications in a logical argument, it is not recommended 
as it can cause unneeded confusion. (Thus, for instance, while it is true 
that a false statement can be used to imply any other statement, true or 
false, doing so arbitrarily would probably not be helpful to the reader.) 

To prove an implication “If X, then Y" , the usual way to do this 
is to first assume that X is true, and use this (together with whatever 
other facts and hypotheses you have) to deduce Y . This is still a valid 
procedure even if X later turns out to be false; the implication does not 
guarantee anything about the truth of X, and only guarantees the truth 
of Y conditionally on X first being true. For instance, the following is 
a valid proof of a true proposition, even though both hypothesis and 
conclusion of the proposition are false: 

Proposition A. 2. 2. If 2 + 2 = 5, then 4 = 10 — 4. 

Proof. Assume 2 + 2 = 5. Multiplying both sides by 2, we obtain 4 + 4 = 
10. Subtracting 4 from both sides, we obtain 4 = 10 — 4 as desired. □ 

On the other hand, a common error is to prove an implication by 
first assuming the conclusion and then arriving at the hypothesis. For 
instance, the following Proposition is correct, but the proof is not: 

Proposition A. 2. 3. Suppose that 2x + 3 = 7. Show that x = 2. 

Proof. (Incorrect) x = 2; so 2x = 4; so 2x + 3 = 7. □ 

When doing proofs, it is important that you are able to distinguish 
the hypothesis from the conclusion; there is a danger of getting hope- 
lessly confused if this distinction is not clear. 

Here is a short proof which uses implications which are possibly 


vacuous. 
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Theorem A. 2. 4. Suppose that n is an integer. Then n(n + 1) is an 
even integer. 

Proof. Since n is an integer, n is even or odd. If n is even, then n(n+ 1) 
is also even, since any multiple of an even number is even. If n is odd, 
then n + 1 is even, which again implies that n(n + 1) is even. Thus in 
either case n(n + 1) is even, and we are done. □ 

Note that this proof relied on two implications: “if n is even, then 
n(n + 1) is even”, and “if n is odd, then n(n + 1) is even”. Since n 
cannot be both odd and even, at least one of these implications has 
a false hypothesis and is therefore vacuous. Nevertheless, both these 
implications are true, and one needs both of them in order to prove the 
theorem, because we don’t know in advance whether n is even or odd. 
And even if we did, it might not be worth the trouble to check it. For 
instance, as a special case of this theorem we immediately know 

Corollary A.2.5. Letn = (253 + 142) * 123- (423 + 198) 342 + 538 - 213. 
Then n(n + 1) is an even integer. 

In this particular case, one can work out exactly which parity n is - 
even or odd - and then use only one of the two implications in the above 
Theorem, discarding the vacuous one. This may seem like it is more 
efficient, but it is a false economy, because one then has to determine 
what parity n is, and this requires a bit of effort - more effort than it 
would take if we had just left both implications, including the vacuous 
one, in the argument. So, somewhat paradoxically, the inclusion of vacu- 
ous, false, or otherwise “useless” statements in an argument can actually 
save you effort in the long run! (I’m not suggesting, of course, that you 
ought to pack your proofs with lots of time- wasting and irrelevant state- 
ments; all I’m saying here is that you need not be unduly concerned that 
some hypotheses in your argument might not be correct, as long as your 
argument is still structured to give the correct conclusion regardless of 
whether those hypotheses were true or false.) 

The statement “If A, then Y” is not the same as “If Y, then A”; 
for instance, while “If x = 2, then x 2 = 4” is true, “If x 2 = 4, then 
x = 2” can be false if x is equal to —2. These two statements are called 
converses of each other; thus the converse of a true implication is not 
necessarily another true implication. We use the statement “A if and 
only if Y” to denote the statement that “If A, then Y ; and if Y . then 
A”. Thus for instance, we can say that x = 2 if and only if 2x = 4, 
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because if x = 2 then 2x = 4, while if 2x = 4 then x = 2. One way 
of thinking about an if-and-only-if statement is to view “A if and only 
if Y” as saying that X is just as true as Y ; if one is true then so is 
the other, and if one is false, then so is the other. For instance, the 
statement “If 3 = 2, then 6 = 4” is true, since both hypothesis and 
conclusion are false. (Under this view, “If X, then Y” can be viewed as 
a statement that Y is at least as true as X.) Thus one could say “A 
and Y are equally true” instead of “A if and only if Y” . 

Similarly, the statement “If X is true, then Y is true” is not the 
same as “If X is false, then Y is false”. Saying that “if x = 2, then 
x 2 = 4” does not imply that “if x ^ 2, then x 2 ^ 4”, and indeed we 
have x = —2 as a counterexample in this case. If-then statements are 
not the same as if-and-only-if statements. (If we knew that “A is true 
if and only if Y is true”, then we would also know that U X is false if 
and only if Y is false”.) The statement “If X is false, then Y is false” is 
sometimes called the inverse of “If X is true, then Y is true” ; thus the 
inverse of a true implication is not necessarily a true implication. 

If you know that “If X is true, then Y is true” , then it is also true 
that “If Y is false, then X is false” (because if Y is false, then X can’t be 
true, since that would imply Y is true, a contradiction). For instance, 
if we knew that “If x = 2. then x 2 = 4”, then we also know that “If 
x 2 7^ 4, then x ^ 2”. Or if we knew “If John had left work at 5pm, he 
would be here by now” , then we also know “If John isn’t here now, then 
he could not have left work at 5pm” . The statement “If Y is false, then 
X is false” is known as the contrapositive of “If X, then Y” and both 
statements are equally true. 

In particular, if you know that X implies something which is known 
to be false, then X itself must be false. This is the idea behind proof 
by contradiction or reductio ad absurdum : to show something must be 
false, assume first that it is true, and show that this implies something 
which you know to be false (e.g., that a statement is simultaneously true 
and not true). For instance: 

Proposition A. 2. 6. Suppose that x be a positive number such that 
sin(x') = 1. Then x > tt/2. 

Proof. Suppose for sake of contradiction that x < n/2. Since x is 
positive, we thus have 0 < x < it/2. Since sin(x) is increasing for 
0 < x < 7t/2, and sin(O) = 0 and sin(7r/2) = 1, we thus have 
0 < sin(x) < 1. But this contradicts the hypothesis that sin(x) = 1. 
Hence x > tt/2. □ 
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Note that one feature of proof by contradiction is that at some point 
in the proof you assume a hypothesis (in this case, that x < 7r/2) which 
later turns out to be false. Note however that this does not alter the 
fact that the argument remains valid, and that the conclusion is true; 
this is because the ultimate conclusion does not rely on that hypothesis 
being true (indeed, it relies instead on it being false!). 

Proof by contradiction is particularly useful for showing “negative” 
statements - that X is false, that a is not equal to b, that kind of thing. 
But the line between positive and negative statements is sort of blurry. 
(Is the statement i>2a positive or negative statement? What about 
its negation, that x < 2?) So this is not a hard and fast rule. 

Logicians often use special symbols to denote logical connectives; for 
instance “ X implies Y” can be written “ X ==^ Y ” , ll X is not true” can 
be written X”, “LY”, or “~>X” , “A and Y" can be written “X A Y” 
or “X&Y”, and so forth. But for general-purpose mathematics, these 
symbols are not often used; English words are often more readable, and 
don’t take up much more space. Also, using these symbols tends to 
blur the line between expressions and statements; it’s not as easy to 
understand “((x = 3) A (y = 5)) => (x + y = 8)” as “If x = 3 and 
y = 5, then x+y = 8” . So in general I would not recommend using these 
symbols (except possibly for , which is a very intuitive symbol). 

A. 3 The structure of proofs 

To prove a statement, one often starts by assuming the hypothesis and 
working one’s way toward a conclusion; this is the direct approach to 
proving a statement. Such a proof might look something like the follow- 
ing: 

Proposition A. 3.1. A implies B. 

Proof. Assume A is true. Since A is true, C is true. Since C is true, D 
is true. Since D is true, B is true, as desired. □ 

An example of such a direct approach is 

Proposition A. 3. 2. If x = n, then sm(x/2) + 1 = 2. 

Proof. Let x = it. Since x = n, we have x/2 = -k/2. Since x/2 = ir/2, 
we have sin(x/2) = 1. Since sin (x/2) = 1, we have sin(x/2) + 1 = 2. □ 
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In the above proof, we started at the hypothesis and moved steadily 
from there toward a conclusion. It is also possible to work backwards 
from the conclusion and seeing what it would take to imply it. For 
instance, a typical proof of Proposition A. 3.1 of this sort might look like 
the following: 

Proof. To show B, it would suffice to show D. Since C implies D, we 
just need to show C. But C follows from A. □ 

As an example of this, we give another proof of Proposition A. 3. 2: 

Proof. To show sin(x'/2) + 1 = 2, it would suffice to show that sin(.x/2) = 
1. Since x/2 = 7r/2 would imply sin(.x/2) = 1, we just need to show that 
x/2 = 7t/2. But this follows since x = it. □ 

Logically speaking, the above two proofs of Proposition A. 3. 2 are 
the same, just arranged differently. Note how this proof style is different 
from the (incorrect) approach of starting with the conclusion and seeing 
what it would imply (as in Proposition A. 2. 3); instead, we start with 
the conclusion and see what would imply it. 

Another example of a proof written in this backwards style is the 
following: 

Proposition A. 3. 3. Let 0 < r < 1 be a real number. Then the series 
J2n=i nr " n convergent. 

Proof. To show this series is convergent, it suffices by the ratio test to 
show that the ratio 


converges to something less than 1 as n — >• oo. Since r is already less 
than 1, it will be enough to show that converges to 1. But since 
n+1 = 1 + it suffices to show that - — > 0. But this is clear since 
n — >• oo. □ 

One could also do any combination of moving forwards from the hy- 
pothesis and backwards from the conclusion. For instance, the following 
would be a valid proof of Proposition A.3.1: 

Proof. To show B , it would suffice to show D. So now let us show D. 
Since we have A by hypothesis, we have C. Since C implies D , we thus 
have D as desired. □ 
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Again, from a logical point of view this is exactly the same proof 
as before. Thus there are many ways to write the same proof down; 
how you do so is up to you, but certain ways of writing proofs are more 
readable and natural than others, and different arrangements tend to 
emphasize different parts of the argument. (Of course, when you are 
just starting out doing mathematical proofs, you’re generally happy to 
get some proof of a result, and don’t care so much about getting the 
“best” arrangement of that proof; but the point here is that a proof can 
take many different forms.) 

The above proofs were pretty simple because there was just one 
hypothesis and one conclusion. When there are multiple hypotheses 
and conclusions, and the proof splits into cases, then proofs can get 
more complicated. For instance a proof might look as tortuous as this: 

Proposition A. 3.4. Suppose that A and B are true. Then C and D 
are true. 

Proof. Since A is true, E is true. From E and B we know that F is 
true. Also, in light of A, to show D it suffices to show G. There are now 
two cases: H and /. If H is true, then from F and H we obtain C, and 
from A and H we obtain G. If instead I is true, then from I we have 
G , and from I and G we obtain C . Thus in both cases we obtain both 
C and G, and hence C and D. □ 

Incidentally, the above proof could be rearranged into a much tidier 
manner, but you at least get the idea of how complicated a proof could 
become. To show an implication there are several ways to proceed: you 
can work forward from the hypothesis; you can work backward from the 
conclusion; or you can divide into cases in the hope to split the problem 
into several easier sub- problems. Another is to argue by contradiction, 
for instance you can have an argument of the form 

Proposition A. 3. 5. Suppose that A is true. Then B is false. 

Proof. Suppose for sake of contradiction that B is true. This would 
imply that C is true. But since A is true, this implies that D is true; 
which contradicts C. Thus B must be false. □ 

As you can see, there are several things to try when attempting a 
proof. With experience, it will become clearer which approaches are 
likely to work easily, which ones will probably work but require much 
effort, and which ones are probably going to fail. In many cases there is 
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really only one obvious way to proceed. Of course, there may definitely 
be multiple ways to approach a problem, so if you see more than one way 
to begin a problem, you can just try whichever one looks the easiest, but 
be prepared to switch to another approach if it begins to look hopeless. 

Also, it helps when doing a proof to keep track of which statements 
are known (either as hypotheses, or deduced from the hypotheses, or 
coming from other theorems and results), and which statements are 
desired (either the conclusion, or something which would imply the con- 
clusion, or some intermediate claim or lemma which will be useful in 
eventually obtaining the conclusion). Mixing the two up is almost al- 
ways a bad idea, and can lead to one getting hopelessly lost in a proof. 


A. 4 Variables and quantifiers 

One can get quite far in logic just by starting with primitive statements 
(such as “2 + 2 = 4” or “John has black hair”), then forming compound 
statements using logical connectives, and then using various laws of logic 
to pass from one’s hypotheses to one’s conclusions; this is known as 
propositional logic or Boolean logic. (It is possible to list a dozen or so 
such laws of propositional logic, which are sufficient to do everything one 
wants to do, but I have deliberately chosen not to do so here, because 
you might then be tempted to memorize that list, and that is not how 
one should learn how to do logic, unless one happens to be a computer 
or some other non-thinking device. However, if you really are curious 
as to what the formal laws of logic are, look up “laws of propositional 
logic” or something similar in the library or on the internet.) 

However, to do mathematics, this level of logic is insufficient, because 
it does not incorporate the fundamental concept of variables - those 
familiar symbols such as x or n which denote various quantities which 
are unknown, or set to some value, or assumed to obey some property. 
Indeed we have already sneaked in some of these variables in order to 
illustrate some of the concepts in propositional logic (mainly because it 
gets boring after a while to talk endlessly about variable-free statements 
such as 2 + 2 = 4 or “Jane has black hair”). Mathematical logic is thus 
the same as propositional logic but with the additional ingredient of 
variables added. 

A variable is a symbol, such as n or x, which denotes a certain type 
of mathematical object - an integer, a vector, a matrix, that kind of 
thing. In almost all circumstances, the type of object that the variable 
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represents should be declared, otherwise it will be difficult to make well- 
formed statements using it. (There are very few true statements that 
one can make about variables without knowing the type of variables 
involved. For instance, given a variable x of any type whatsoever, it is 
true that x = x, and if we also know that x = y, then we can conclude 
that y = x. But one cannot say, for instance, that x + y = y + x, until 
we know what type of objects x and y are and whether they support the 
operation of addition; for instance, the above statement is ill-formed if 
x is a matrix and y is a vector. Thus if one actually wants to do some 
useful mathematics, then every variable should have an explicit type.) 

One can form expressions and statements involving variables, for 
instance, if x is a real variable (i.e., a variable which is a real number), 
x + 3 is an expression, and x + 3 = 5 is a statement. But now the truth 
of a statement may depend on the value of the variables involved; for 
instance the statement x + 3 = 5 is true if x is equal to 2, but is false if 
x is not equal to 2. Thus the truth of a statement involving a variable 
may depend on the context of the statement - in this case, it depends 
on what x is supposed to be. (This is a modification of the rule for 
propositional logic, in which all statements have a definite truth value.) 

Sometimes we do not set a variable to be anything (other than spec- 
ifying its type). Thus, we could consider the statement x + 3 = 5 where 
x is an unspecified real number. In such a case we call this variable 
a free variable ; thus we are considering x + 3 = 5 with x a free vari- 
able. Statements with free variables might not have a definite truth 
value, as they depend on an unspecified variable. For instance, we have 
already remarked that x + 3 = 5 does not have a definite truth value 
if a; is a free real variable, though of course for each given value of x 
the statement is either true or false. On the other hand, the statement 
(x T l)^ = x 2 + 2x + 1 is true for every real number x, and so we can 
regard this as a true statement even when x is a free variable. 

At other times, we set a variable to equal a fixed value, by using 
a statement such as “Let x = 2” or “Set x equal to 2”. In this case, 
the variable is known as a bound variable, and statements involving only 
bound variables and no free variables do have a definite truth value. 
For instance, if we set x = 342, then the statement u x + 135 = 477” 
now has a definite truth value, whereas if x is a free real variable then 
“x + 135 = 477” could be either true or false, depending on what x 
is. Thus, as we have said before, the truth of a statement such as 
“x + 135 = 477” depends on the context - whether x is free or bound, 
and if it is bound, what it is bound to. 
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One can also turn a free variable into a bound variable by using the 
quantifiers “for all” or “for some”. For instance, the statement 

(. x + l) 2 = x 2 + 2x + 1 

is a statement with one free variable x, and need not have a definite 
truth value, but the statement 

[x + l) 2 = x 2 + 2x + 1 for all real numbers x 

is a statement with one bound variable x, and now has a definite truth 
value (in this case, the statement is true). Similarly, the statement 

x + 3 = 5 

has one free variable, and does not have a definite truth value, but the 
statement 

£ + 3 = 5 for some real number x 

is true, since it is true for x = 2. On the other hand, the statement 
£ + 3 = 5 for all real numbers x 

is false, because there are some (indeed, there are many) real numbers 
£ for which x + 3 is not equal to 5. 

Universal quantifiers. Let P(x) be some statement depending on a 
free variable x. The statement a P(x) is true for all x of type T” means 
that given any x of type T, the statement P(x) is true regardless of 
what the exact value of x is. In other words, the statement is the same 
as saying “if x is of type T, then P(x) is true” . Thus the usual way to 
prove such a statement is to let x be a free variable of type T (by saying 
something like “Let x be any object of type T”), and then proving P(x) 
for that object. The statement becomes false if one can produce even a 
single counterexample, i.e. , an element x which lies in T but for which 
P(x) is false. For instance, the statement u x 2 is greater than x for all 
positive x" can be shown to be false by producing a single example, such 
as £ = 1 or £ = 1/2, where £ 2 is not greater than x. 

On the other hand, producing a single example where P(x) is true 
will not show that P(x) is true for all x. For instance, just because 
the equation £ + 3 = 5 has a solution when x = 2 does not imply that 
£ + 3 = 5 for all real numbers x; it only shows that x + 3 = 5 is true 
for some real number x. (This is the source of the often-quoted, though 
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somewhat inaccurate, slogan “One cannot prove a statement just by 
giving an example”. The more precise statement is that one cannot 
prove a “for all” statement by examples, though one can certainly prove 
“for some” statements this way, and one can also disprove “for all” 
statements by a single counterexample.) 

It occasionally happens that there are in fact no variables x of type T. 
In that case the statement “P(x ) is true for all x of type T” is vacuously 
true - it is true but has no content, similar to a vacuous implication. 
For instance, the statement 

6 < 2x < 4 for all 3 < x < 2 

is true, and easily proven, but is vacuous. (Such a vacuously true state- 
ment can still be useful in an argument, though this doesn’t happen very 
often.) 

One can use phrases such as “For every” or “For each” instead of 
“For all”, e.g., one can rephrase “(x + l) 2 = x 2 + 2x + 1 for all real 
numbers x” as “For each real number x, (x + 1) 2 is equal to x 2 + 2x + 1”. 
For the purposes of logic these rephrasings are equivalent. The symbol 
V can be used instead of “For all”, thus for instance “Vx € X : P(x) is 
true” or “P(x) is true Vx € X ” is synonymous with “P(x) is true for all 
x € X”. 

Existential quantifiers. The statement “P(x) is true for some x of 
type T” means that there exists at least one x of type T for which P(x) 
is true, although it may be that there is more than one such x. (One 
would use a quantifier such as “for exactly one x” instead of “for some 
x” if one wanted both existence and uniqueness of such an x.) To prove 
such a statement it suffices to provide a single example of such an x. 
For instance, to show that 

x 2 + 2x — 8 = 0 for some real number x 

all one needs to do is find a single real number x for which x 2 +2x— 8 = 0, 
for instance x = 2 will do. (One could also use x = —4, but one 
doesn’t need to use both.) Note that one has the freedom to select 
x to be anything one wants when proving a for-some statement; this 
is in contrast to proving a for-all statement, where one has to let x be 
arbitrary. (One can compare the two statements by thinking of two 
games between you and an opponent. In the first game, the opponent 
gets to pick what x is, and then you have to prove P(x); if you can 
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always win this game, then you have proven that P{x) is true for all x. 
In the second game, you get to choose what x is, and then you prove 
P(x); if you can win this game, you have proven that P[x) is true for 
some x.) 

Usually, saying something is true for all x is much stronger than 
just saying it is true for some x. There is one exception though, if 
the condition on x is impossible to satisfy, then the for-all statement is 
vacuously true, but the for-some statement is false. For instance 

6 < 2x < 4 for all 3 < x < 2 


is true, but 


6 < 2x < 4 for some 3 < x < 2 


is false. 

One can use phrases such as “For at least one” or “There exists 
. . . such that” instead of “For some” . For instance, one can rephrase 
u x 2 + 2x — 8 = 0 for some real number x” as “There exists a real number 
x such that x 2 + 2x — 8 = 0” . The symbol 3 can be used instead of 
“There exists . . .such that”, thus for instance “3.x € X : P(x) is true” 
is synonymous with “P{x) is true for some x € X ” . 

A. 5 Nested quantifiers 

One can nest two or more quantifiers together. For instance, consider 
the statement 


For every positive number x. there exists a 
positive number y such that y 2 = x. 

What does this statement mean? It means that for each positive 
number x, the statement 

There exists a positive number y such that y 2 = x 

is true. In other words, one can find a positive square root of x for 
each positive number x. So the statement is saying that every positive 
number has a positive square root. 

To continue the gaming metaphor, suppose you play a game where 
your opponent first picks a positive number x, and then you pick a 
positive number y. You win the game if y 1 = x. If you can always win 
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the game regardless of what your opponent does, then you have proven 
that for every positive x, there exists a positive y such that y 2 = x. 

Negating a universal statement produces an existential statement. 
The negation of “All swans are white” is not “All swans are not white” , 
but rather “There is some swan which is not white”. Similarly, the 
negation of “For every 0 <x< 7 r/ 2 , we have cos(x) > 0” is “For some 
0 < x < 7t/2, we have cos(x) < 0, not “For every 0 < x < 7t/2, we have 
cos(x) < 0 ”. 

Negating an existential statement produces a universal statement. 
The negation of “There exists a black swan” is not “There exists a swan 
which is non-black”, but rather “All swans are non-black”. Similarly, 
the negation of “There exists a real number x such that x 2 + x + 1 = 0” 
is “For every real number x, x 2 + x + 1 7 ^ 0”, not “There exists a real 
number x such that x 2 + x + 1 7 ^ 0” . (The situation here is very similar 
to how “and” and “or” behave with respect to negations.) 

If you know that a statement P(x) is true for all x, then you can set 
x to be anything you want, and P(x) will be true for that value of x; 
this is what “for all” means. Thus for instance if you know that 

(x + l ) 2 = x 2 + 2x + 1 for all real numbers x, 
then you can conclude for instance that 

(lT + l) 2 = 7T 2 + 2n + 1, 


or for instance that 

(cos (y) + l) 2 = cos(y) 2 + 2 cos (y) + 1 for all real numbers y 

(because if y is real, then cos (y) is also real), and so forth. Thus universal 
statements are very versatile in their applicability - you can get P(x) to 
hold for whatever x you wish. Existential statements, by contrast, are 
more limited; if you know that 

x 2 + 2x — 8 = 0 for some real number x 

then you cannot simply substitute in any real number you wish, e.g., 7 r, 
and conclude that 7r 2 + 2tt — 8 = 0. However, you can of course still 
conclude that x 2 + 2x — 8 = 0 for some real number x, it’s just that 
you don’t get to pick which x it is. (To continue the gaming metaphor, 
you can make P(x) hold, but your opponent gets to pick x for you, you 
don’t get to choose for yourself.) 
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Remark A. 5.1. In the history of logic, quantifiers were formally studied 
thousands of years before Boolean logic was. Indeed, Aristotlean logic, 
developed of course by Aristotle (384BC - 322BC) and his school, deals 
with objects, their properties, and quantifiers such as “for all” and “for 
some”. A typical line of reasoning (or syllogism ) in Aristotlean logic 
goes like this: “All men are mortal. Socrates is a man. Hence, Socrates 
is mortal”. Aristotlean logic is a subset of mathematical logic, but is 
not as expressive because it lacks the concept of logical connectives such 
as and, or, or if-then (although “not” is allowed), and also lacks the 
concept of a binary relation such as = or <. 

Swapping the order of two quantifiers may or may not make a dif- 
ference to the truth of a statement. Swapping two “for all” quantifiers 
is harmless: a statement such as 

For all real numbers a, and for all real numbers b, 
we have (a + 6) 2 = a 2 + 2 ab + b 2 

is logically equivalent to the statement 

For all real numbers b, and for all real numbers a, 
we have (a + 6) 2 = a 2 + 2 ab + b 2 

(why? The reason has nothing to do with whether the identity (a +6) 2 = 
a 2 + 2 ab + b 2 is actually true or not). Similarly, swapping two “there 
exists” quantifiers has no effect: 

There exists a real number a, and there exists a real number b, 
such that a 2 + b 2 = 0 


is logically equivalent to 

There exists a real number b, and there exists a real number a, 
such that a 2 + b 2 = 0. 

However, swapping a “for all” with a “there exists” makes a lot of 
difference. Consider the following two statements: 

(a) For every integer n, there exists an integer m which is larger 
than n. 

(b) There exists an integer m such that m is larger than n for every 
integer n. 
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Statement (a) is obviously true: if your opponent hands you an in- 
teger n, you can always find an integer m which is larger than n. But 
Statement (b) is false: if you choose m first, then you cannot ensure 
that m is larger than every integer n; your opponent can easily pick a 
number n bigger than rn to defeat that. The crucial difference between 
the two statements is that in Statement (a), the integer n was chosen 
first , and integer rn could then be chosen in a manner depending on n; 
but in Statement (b), one was forced to choose m first, without knowing 
in advance what n is going to be. In short, the reason why the order of 
quantifiers is important is that the inner variables may possibly depend 
on the outer variables, but not vice versa. 

— Exercises — 

Exercise A. 5.1. What does each of the following statements mean, and which 
of them are true? Can you find gaming metaphors for each of these statements? 

(a) For every positive number x, and every positive number y , we have y 2 = 

x. 

(b) There exists a positive number x such that for every positive number y , 
we have y 2 = x. 

(c) There exists a positive number x , and there exists a positive number y , 
such that y 2 = x. 

(d) For every positive number y 1 there exists a positive number x such that 
y 2 = x. 

(e) There exists a positive number y such that for every positive number x, 
we have y 2 = x. 

A. 6 Some examples of proofs and quantifiers 

Here we give some simple examples of proofs involving the “for all” and 
“there exists” quantifiers. The results themselves are simple, but you 
should pay attention instead to how the quantifiers are arranged and 
how the proofs are structured. 

Proposition A. 6.1. For every e > 0 there exists a 6 > 0 such that 

26 < e. 

Proof. Let e > 0 be arbitrary. We have to show that there exists a 6 > 0 
such that 26 < e. We only need to pick one such 6 ; choosing 6 := e/3 
will work, since one then has 26 = 2e/3 < e. □ 
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Notice how e has to be arbitrary, because we are proving something 
for every e; on the other hand, 6 can be chosen as you wish, because you 
only need to show that there exists a 6 which does what you want. Note 
also that 6 can depend on e, because the (5-quantifier is nested inside 
the e-quantifier. If the quantifiers were reversed, i.e. , if you were asked 
to prove “There exists a 5 > 0 such that for every e > 0, 2<5 < e”, then 
you would have to select <5 first before being given e. In this case it is 
impossible to prove the statement, because it is false (why?). 

Normally, when one has to prove a “There exists...” statement, e.g., 
“Prove that there exists an e > 0 such that X is true”, one proceeds 
by selecting e carefully, and then showing that X is true for that e. 
However, this sometimes requires a lot of foresight, and it is legitimate 
to defer the selection of e until later in the argument, when it becomes 
clearer what properties e needs to satisfy. The only thing to watch out 
for is to make sure that e does not depend on any of the bound variables 
nested inside X. For instance: 

Proposition A. 6. 2. There exists an e > 0 such that sin(x) > x/2 for 
all 0 < x < e. 

Proof. We pick e > 0 to be chosen later, and let 0 < x < e. Since the 
derivative of sin(x) is cos(x), we see from the mean- value theorem we 
have 

sin(x) sin(x) — sin(0) 

x x — 0 

for some 0 < y < x. Thus in order to ensure that sin(x) > x/2, it 
would suffice to ensure that cos (y) > 1/2. To do this, it would suffice 
to ensure that 0 < y < ir/3 (since the cosine function takes the value 
of 1 at 0, takes the value of 1/2 at 7r/3, and is decreasing in between). 
Since 0 < y < x and 0 < x < e, we see that 0 < y < e. Thus if we pick 
s := 7 t /3, then we have 0<y<7r/3as desired, and so we can ensure 
that sin(x) > x/2 for all 0 < x < e. □ 

Note that the value of e that we picked at the end did not depend on 
the nested variables x and y. This makes the above argument legitimate. 
Indeed, we can rearrange it so that we don’t have to postpone anything: 

Proof. We choose e := 7t/ 3; clearly e > 0. Now we have to show that for 
all 0 < x < 7 t/ 3, we have sin(x) > x/2. So let 0 < x < 7r/3 be arbitrary. 
By the mean-value theorem we have 

sin(x) sin(x) — sin(0) 
x — 0 


x 


cos(y) 
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for some 0 < y < x. Since 0 < y < x and 0 < x < 7t/3, we have 0 < 
y < 7r/3. Thus cos (y) > cos(7t/3) = 1/2, since cos is decreasing on the 
interval [0, 7t/3]. Thus we have sin(x)/x > 1/2 and hence sin(x) > x/2 
as desired. □ 

If we had chosen e to depend on x and y then the argument would 
not be valid, because e is the outer variable and x, y are nested inside 
it. 

A. 7 Equality 

As mentioned before, one can create statements by starting with expres- 
sions (such as 2 x 3 + 5) and then asking whether an expression obeys 
a certain property, or whether two expressions are related by some sort 
of relation (=, <, €, etc.). There are many relations, but the most im- 
portant one is equality , and it is worth spending a little time reviewing 
this concept. 

Equality is a relation linking two objects x, y of the same type T 
(e.g., two integers, or two matrices, or two vectors, etc.). Given two 
such objects x and y, the statement x = y may or may not be true; it 
depends on the value of x and y and also on how equality is defined for 
the class of objects under consideration. For instance, as real numbers, 
the two numbers 0.9999 . . . and 1 are equal. In modular arithmetic with 
modulus 10 (in which numbers are considered equal to their remainders 
modulo 10), the numbers 12 and 2 are considered equal, 12 = 2, even 
though this is not the case in ordinary arithmetic. 

How equality is defined depends on the class T of objects under 
consideration, and to some extent is just a matter of definition. However, 
for the purposes of logic we require that equality obeys the following four 
axioms of equality: 

• (Reflexive axiom). Given any object x, we have x = x. 

• (Symmetry axiom). Given any two objects x and y of the same 
type, if x = y, then y = x. 

• (Transitive axiom). Given any three objects x, y, z of the same 
type, if x = y and y = z, then x = z. 

• (Substitution axiom). Given any two objects x and y of the same 
type, if x = y, then f(x) = f(y) for all functions or operations /. 
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Similarly, for any property P(x) depending on x, if x = y, then 
P(x) and P(y) are equivalent statements. 

The first three axioms are clear, together, they assert that equality 
is an equivalence relation. To illustrate the substitution we give some 
examples. 

Example A. 7.1. Let x and y be real numbers. If x = y, then 2x = 2 y, 
and sin(x) = sin(y). Furthermore, x + z = y + z for any real number z. 

Example A. 7. 2. Let n and m be integers. If n is odd and n = m, then 
m must also be odd. If we have a third integer k, and we know that 
n > k and n = m , then we also know that m > k. 

Example A. 7. 3. Let x, y, z be real numbers. If we know that x = sin(y) 
and y = z 2 , then (by the substitution axiom) we have sin(y) = sin(z 2 ), 
and hence (by the transitive axiom) we have x = sin(z 2 ). 

Thus, from the point of view of logic, we can define equality on a class 
of objects however we please, so long as it obeys the reflexive, symmetry, 
and transitive axioms, and is consistent with all other operations on the 
class of objects under discussion in the sense that the substitution axiom 
was true for all of those operations. For instance, if we decided one day 
to modify the integers so that 12 was now equal to 2, one could only do so 
if one also made sure that 2 was now equal to 12, and that /( 2) = /(12) 
for any operation / on these modified integers. For instance, we now 
need 2 + 5 to be equal to 12 + 5. (In this case, pursuing this line of 
reasoning will eventually lead to modular arithmetic with modulus 10.) 


— Exercises — 

Exercise A. 7.1. Suppose you have four real numbers a,b,c,d and you know 
that a = b and c = d. Use the above four axioms to deduce that a + d = b + c. 
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In Chapters 2, 4, 5 we painstakingly constructed the basic number sys- 
tems of mathematics: the natural numbers, integers, rationals, and re- 
als. Natural numbers were simply postulated to exist, and to obey five 
axioms; the integers then came via (formal) differences of the natural 
numbers; the rationals then came from (formal) quotients of the integers, 
and the reals then came from (formal) limits of the rationals. 

This is all very well and good, but it does seem somewhat alien to 
one’s prior experience with these numbers. In particular, very little use 
was made of the decimal system , in which the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 
are combined to represent these numbers. Indeed, except for a number 
of examples which were not essential to the main construction, the only 
decimals we really used were the numbers 0, 1, and 2, and the latter two 
can be rewritten as 0-H- and (O-H-)-H-. 

The basic reason for this is that the decimal system itself is not 
essential to mathematics. It is very convenient for computations, and 
we have grown accustomed to it thanks to a thousand years of use, 
but in the history of mathematics it is actually a comparatively recent 
invention. Numbers have been around for about ten thousand years 
(starting from scratch marks on cave walls), but the modern Hindu- 
Arabic base 10 system for representing numbers only dates from the 11th 
century or so. Some early civilizations relied on other bases; for instance 
the Babylonians used a base 60 system (which still survives in our time 
system of hours, minutes, and seconds, and in our angular system of 
degrees, minutes, and seconds). And the ancient Greeks were able to do 
quite advanced mathematics, despite the fact that the most advanced 
number representation system available to them was the Roman numeral 
system /, II, III , IV , . . ., which was horrendous for computations of even 
two-digit numbers. And of course modern computing relies on binary, 
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hexadecimal, or byte-based (base 256) arithmetic instead of decimal, 
while analog computers such as the slide rule do not really rely on any 
number representation system at all. In fact, now that computers can do 
the menial work of number-crunching, there is very little use for decimals 
in modern mathematics. Indeed, we rarely use any numbers other than 
one-digit numbers or one-digit fractions (as well as e, 7 r, i ) explicitly in 
modern mathematical work; any more complicated numbers usually get 
called more generic names such as n. 

Nevertheless, the subject of decimals does deserve an appendix, be- 
cause it is so integral to the way we use mathematics in our everyday 
life, and also because we do want to use such notation as 3.14159... 
to refer to real numbers, as opposed to the far clunkier “LIM n _ ) . 00 a n , 
where ai = 3.1, <22 := 3.14, <23 := 3.141, . . .”. 

We begin by reviewing how the decimal system works for the positive 
integers, and then turn to the reals. Note that in this discussion we shall 
freely use all the results from earlier chapters. 

B.l The decimal representation of natural numbers 

In this section we will avoid the usual convention of abbreviating a x b 
as ab, since this would mean that decimals such as 34 might be miscon- 
strued as 3 x 4. 

Definition B.1.1 (Digits). A digit is any one of the ten symbols 
0, 1, 2, 3, . . . , 9. We equate these digits with natural numbers by the 
formulae 0 = 0, 1 = 0++, 2 = 1++, etc. all the way up to 9 = 8++. 
We also define the number ten by the formula ten := 9-H-. (We cannot 
use the decimal notation 10 to denote ten yet, because that presumes 
knowledge of the decimal system and would be circular.) 

Definition B.l. 2 (Positive integer decimals). A positive integer decimal 
is any string a n a n -\ . . . <20 of digits, where n > 0 is a natural number, 
and the first digit a n is non-zero. Thus, for instance, 3049 is a positive 
integer decimal, but 0493 or 0 is not. We equate each positive integer 
decimal with a positive integer by the formula 

n 

a a a n 1 • • • a 0 — ^ ( tq x ten . 
i= 0 

Remark B.l. 3. Note in particular that this definition implies that 
10 = 0 x ten 0 +1 x ten 1 = ten 
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and thus we can write ten as the more familiar 10. Also, a single digit 
integer decimal is exactly equal to that digit itself, e.g., the decimal 3 
by the above definition is equal to 

3 = 3 x ten 0 = 3 

so there is no possibility of confusion between a single digit, and a single 
digit decimal. (This is a subtle distinction, and not one which is worth 
losing much sleep over.) 

Now we show that this decimal system indeed represents the posi- 
tive integers. It is clear from the definition that every positive decimal 
representation gives a positive integer, since the sum consists entirely of 
natural numbers, and the last term a n ten” is non-zero by definition. 

Theorem B.l. 4 (Uniqueness and existence of decimal representations). 
Every positive integer m is equal to exactly one positive integer decimal. 

Proof. We shall use the principle of strong induction (Proposition 2.2.14, 
with m.Q := 1). For any positive integer m , let P(m) denote the state- 
ment “m is equal to exactly one positive integer decimal” . Suppose we 
already know P(m') is true for all positive integers to ' < nr, we now 
wish to prove P(m). 

First observe that either rn > ten or m. £ {1, 2, 3, 4, 5, 6, 7, 8, 9}. 
(This is easily proved by ordinary induction.) Suppose first that 
to £ {1, 2, 3, 4, 5, 6, 7, 8, 9}. Then to clearly is equal to a positive integer 
decimal consisting of a single digit, and there is only one single-digit 
decimal which is equal to to. Furthermore, no decimal consisting of two 
or more digits can equal to, since if a n . . . a o is such a decimal (with 
n > 0) we have 


n 

a n . . . ao = ^2 ai x ten* > a n x ten” > ten > to. 
i=0 

Now suppose that m > ten. Then by the Euclidean algorithm 
(Proposition 2.3.9) we can write 

to = -s x ten +r 

where s is a positive integer, and r £ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Since 
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we can use the strong induction hypothesis and conclude that P(s) is 
true. In particular, s has a decimal representation 

p 

s = b p . . . bo = bi x ten* . 
i=0 

Multiplying by ten, we see that 

p 

s x ten = ^2 bi x ten* +1 = b p . . . boO , 
i = o 

and then adding r we see that 

p 

m = s x ten +r = bi x ten* +1 +r = b p . . . b$r. 
i=0 

Thus rn has at least one decimal representation. Now we need to show 
that m has at most one decimal representation. Suppose for sake of 
contradiction that we have at least two different representations 

TTl — • • • Oo — Uy, / . . . a 0 . 

First observe by the previous computation that 

a^i . • • a o — ( • • . aq) x ten Tuq 


and 

a' n , . . . a' 0 = ( a' n , . . . a \ ) x ten +ag 
and so after some algebra we obtain 

Oq — ag = (a n . . . ai — a' n i . . . a \ ) x ten . 


The right-hand side is a multiple of ten, while the left-hand side lies 
strictly between — ten and + ten. Thus both sides must be equal to 0. 
This means that a o = a' 0 and a n . . . a\ = a' n , . . . a\ . But by previous 
arguments, we know that a n ...a\ is a smaller integer than a n ...aQ. 
Thus by the strong induction hypothesis, the number a n . . . ao has only 
one decimal representation, which means that n' must equal n and a\ 
must equal a, for all i = l,...,n. Thus the decimals a n ...ao and 
a' n , . . . a' Q are in fact identical, contradicting the assumption that they 
were different. □ 


We refer to the decimal given by the above theorem as the decimal 
representation of m. Once one has this decimal representation, one can 
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then derive the usual laws of long addition and long multiplication to 
connect the decimal representation of x + y or x x y to that of x or y 
(Exercise B.1.1). 

Once one has decimal representation of positive integers, one can of 
course represent negative integers decimally as well by using the — sign. 
Finally, we let 0 be a decimal as well. This gives decimal representations 
of all integers. Every rational is then the ratio of two decimals, e.g., 
335/113 or —1/2 (with the denominator required to be non-zero, of 
course), though there may be more than one way to represent a rational 
as such a ratio, e.g., 6/4 = 3/2. 

Since ten = 10, we will now use 10 instead of ten throughout, as is 
customary. 


— Exercises — 

Exercise B.1.1. The purpose of this exercise is to demonstrate that the proce- 
dure of long addition taught to you in elementary school is actually valid. Let 
A = a n . . . a o and B = b m ... b 0 be positive integer decimals. Let us adopt the 
convention that a,; = 0 when i > n, and bi = 0 when i > m; for instance, if 
A = 372, then a o = 2, a\ = 7, a .2 = 3, = 0, 04 = 0, and so forth. Define 
the numbers Co, Ci, . . . and £ 0 , £ 1 , ■ ■ ■ recursively by the following long addition 
algorithm. 

• We set £0 := 0. 

• Now suppose that £* has already been defined for some i > 0. If dj + 
h + £i < 10, we set c, := a,; + bi + £, and £,+i := 0; otherwise, if 
ai + bi + £i > 10, we set Ci := a* + bi + £* — 10 and £j + 1 = 1 . (The number 
£j + 1 is the “carry digit” from the i th decimal place to the ( i+l) th decimal 
place.) 

Prove that the numbers co,Ci, . . . are all digits, and that there exists an l 
such that Ci ^ 0 and c* = 0 for all i > l. Then show that qc;_ 1 . . . CiCo is the 
decimal representation of A + B. 

Note that one could in fact use this algorithm to define addition, but it 
would look extremely complicated, and to prove even such simple facts as 
(a + b) + c = a + (b + c) would be rather difficult. This is one of the reasons 
why we have avoided the decimal system in our construction of the natural 
numbers. The procedure for long multiplication (or long subtraction, or long 
division) is even worse to lay out rigorously; we will not do so here. 


B.2 The decimal representation of real numbers 

We need a new symbol: the decimal point . 
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Definition B.2.1 (Real decimals). A real decimal is any sequence of 
digits, and a decimal point, arranged as 

±a n . . . ao.a-ifl-2 • • • 

which is finite to the left of the decimal point (so n is a natural number) , 
but infinite to the right of the decimal point, where ± is either + or — , 
and a n . . . ao is a natural number decimal (i.e. , either a positive integer 
decimal, or 0). This decimal is equated to the real number 

n 

±a n . . . ao-a_ia _2 . . . = ±1 x a* x 10*. 

i =— oo 

The series is always convergent (Exercise B.2.1). Next, we show that 
every real number has at least one decimal representation: 

Theorem B.2.2 (Existence of decimal representations). Every real 
number x has at least one decimal representation 

x = Aa n . . . aQ.a— id _ 2 

Proof. We first note that x = 0 has the decimal representation 0.000 .... 
Also, once we find a decimal representation for x. we automatically get 
a decimal representation for — x by changing the sign ±. Thus it suffices 
to prove the theorem for positive real numbers x (by Proposition 5.4.4). 

Let n > 0 be any natural number. From the Archimedean property 
(Corollary 5.4.13) we know that there is a natural number M such that 
M x 10~ n > x. Since 0 x 10~ n < x, we thus see that there must exist a 
natural number s n such that s n X 10 _n < x and s ra -H- X 10 -n > x. (If no 
such natural number existed, one could use induction to conclude that 
s x 10 - ** < x for all natural numbers s, contradicting the Archimedean 
property.) 

Now consider the sequence 6'o, si, s 2 , . . .. Since we have 
s n x 10“** < x < (s n + 1) x IQ"** 


we thus have 

(10 x s n ) x 10- (n++) < x < (10 x s n + 10) x 10 _(n++) . 


On the other hand, we have 

Sn+i x 10- (n+1) < x < (s n+ i + 1) x 10"( n+1) 
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and hence we have 

10 x s n < s n+ i + 1 and s n+ i < 10 x s n + 10. 

From these two inequalities we see that we have 
10 x s n < s n + 1 < 10 x s n + 9 
and hence we can find a digit a n+ i such that 

■‘>n+i = 10 x s n + a n + 1 

and hence 

s n+ i x l(T (n+1) = s n x 10" n + a n+ i x 10" (n+1) . 

From this identity and induction, we can obtain the formula 

n 

S n X lCT" = SO + ^ CLi X 1CT*. 

i = 0 

Now we take limits of both sides (using Exercise B.2.1) to obtain 

OO 

lim s n x 10 _n = -so + V ai x 10 _ *. 

n— >oo 

i = 0 

On the other hand, we have 

x - 10 _n < s n x 10 _n < x 

for all n, so by the squeeze test (Corollary 6.4.14) we have 

lim s n x 10~ n = x. 

n— >oo 

Thus we have 

OO 

x = -so + ^2 a,i x 10 _ *. 
i= o 

Since so already has a positive integer decimal representation by Theo- 
rem B.1.4, we thus see that x has a decimal representation as desired. □ 

There is however one slight flaw with the decimal system: it is pos- 
sible for one real number to have two decimal representations. 
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Proposition B.2.3 (Failure of uniqueness of decimal representations). 
The number 1 has two different decimal representations: 1.000... and 
0.999 .... 

Proof. The representation 1 = 1.000... is clear. Now let’s compute 
0.999 .... By definition, this is the limit of the Cauchy sequence 

0.9,0.99,0.999,0.9999,.... 

But this sequence has 1 as a formal limit by Proposition 5.2.8. □ 

It turns out that these are the only two decimal representations of 1 
(Exercise B.2.2). In fact, as it turns out, all real numbers have either one 
or two decimal representations - two if the real is a terminating decimal, 
and one otherwise (Exercise B.2.3). 

— Exercises — 

Exercise B.2.1. If a n . . . ao-a_ia _2 ... is a real decimal, show that the series 
Er=_oo a i x 10* absolutely convergent. 

Exercise B.2.2. Show that the only decimal representations 

1 = ±a n . . . a 0 .a_ia_ 2 . . . 
of 1 are 1 = 1.000 ... and 1 = 0.999 . . .. 

Exercise B.2.3. A real number x is said to be a terminating decimal if we have 
x = n/10“ m for some integers n,m. Show that if a: is a terminating decimal, 
then x has exactly two decimal representations, while if x is not at terminating 
decimal, then x has exactly one decimal representation. 

Exercise B.2.4. Rewrite the proof of Corollary 8.3.4 using the decimal system. 
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for rationals, 86 
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abstraction, 21-22 
addition 
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of functions, 219 
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antiderivative, 298 
Archimedian property, 115 
Aristotlean logic, 326 
associativity 

of addition in N, 26 
of composition, 52, 53 
of multiplication in N, 30 
see also: ring, field, laws of 
algebra 

asymptotic discontinuity, 234 
Axiom(s) 

in mathematics, 21-22 
of choice, 36, 65, 200 
of comprehension: see 
Axiom of universal 
specification 
of countable choice, 200 
of equality, 329 


© Springer Science+Business Media Singapore 2016 and Hindustan Book Agency 2015 
T. Tao, Analysis /, Texts and Readings in Mathematics 37, 

DOI 10.1007/978-981-10-1789-6 


339 



340 


Index 


of foundation: see Axiom 
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of induction: see principle 
of mathematical 
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of infinity, 44 
of natural numbers: see 
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43-44, 48, 58 
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46 

bijection, 54 
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bound variable, 156, 321, 328 
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function, 234 
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uniqueness of, 70 
Cartesian product, 62 
infinite, 199, 200 
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chain rule, 256 
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arbitrary, 200 
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of addition in N, 26 
of multiplication in N, 30 
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for finite series, 157 
for infinite series, 170 
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of the reals, 146 
composition of functions, 52 
conjunction (and), 309 
connectedness, 268 
constant 

function, 51, 272 
sequence, 148 
continuity, 227 

and convergence, 222 
continuum, 211 

hypothesis, 197 
contrapositive, 316 
convergence 

of a function at a point, 221 
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of series, 165 
converse, 315 
corollary, 25 
countability, 181 

of the integers, 185 
of the rationals, 187 

de Morgan laws, 43 
decimal 

negative integer, 335 
non-uniqueness of 

representation, 338 
point, 335 

positive integer, 335 
real, 336 

denumerable: see countable 
derivative, 251 
difference rule, 255 
difference set, 42 
differentiability 
at a point, 251 
digit, 332 
direct sum 

of functions, 63 
discontinuity: see singularity 


disjoint sets, 42 
disjunction (or), 309 

inclusive vs. exclusive, 310 
distance 
in Q, 87 
in R, 126 
distributive law 

for natural numbers, 30 
see also: laws of algebra 
divergence 

of sequences, 4 
of series, 3, 165 
see also: convergence 
divisibility, 207 
division 

by zero, 3 
formal (//), 82 
of functions, 219 
of rationals, 85 
domain, 49 

dominate: see majorize 

dominated convergence: see 
Lebesgue dominated 
convergence theorem 
doubly infinite, 212 
dummy variable: see bound 
variable 

empty 

Cartesian product, 64 
function, 52 
sequence, 64 
series, 160 
set, 36 
equality, 329 

for functions, 51 
for sets, 35 
of cardinality, 68 
equivalence 

of sequences, 101, 245 



342 


Index 


relation, 330 
Euclidean algorithm, 31 
exponentiation 
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with base and exponent in 
N, 32 

with base in Q and 

exponent in Z, 89, 90 
with base in R and 
exponent in Z, 122 
with base in R + and 
exponent in Q, 124 
with base in R + and 
exponent in R, 154 
expression, 308 
extended real number system 
R*, 119, 133 

extremum: see maximum, 
minimum 

factorial, 164 
family, 60 
field, 84 

ordered, 86 
finite set, 70 
fixed point theorem, 241 
forward image: see image 
free variable, 321 
Fubini’s theorem 

for finite series, 163 
for infinite series, 188 
see also: interchanging 
integrals/sums with 
integrals/sums 
function, 49 

implicit definition, 50 
fundamental theorems of 
calculus, 296, 298 

geometric series, 165, 171 


formula, 171, 174 
graph, 51, 66, 219 
greatest lower bound: see least 
upper bound 

half- infinite, 212 
half-open, 212 
harmonic series, 173 
Hausdorff maximality principle, 
209 

Heine-Borel theorem 
for the real line, 216 

identity map (or operator), 56 
if: see implication 
iff (if and only if), 27 
ill-defined, 306, 309 
image 

inverse image, 56 
of sets, 56 

implication (if), 312 
improper integral, 278 
inclusion map, 56 
inconsistent, 198, 199 
index of summation: see 
dummy variable 
index set, 60 

induction: see Principle of 
mathematical 
induction 

infimum: see supremum 
infinite 

interval, 212 
set, 70 

injection: see one-to-one 
function 

integer part, 91, 116 
integers Z 

definition, 74 



Index 


343 


identification with 
rationals, 83 
interspersing with 
rationals, 91 
integral test, 290 
integration 

by parts, 300-302 
laws, 275, 280 

piecewise constant, 273, 274 
Riemann: see Riemann 
integral 
interchanging 

derivatives with derivatives, 
9 

finite sums with finite 
sums, 162, 163 
integrals with integrals, 1 
limits with derivatives, 8 
limits with integrals, 8 
limits with length, 11 
limits with limits, 7, 8 
sums with sums, 5, 188 
intermediate value theorem, 238 
intersection 

pairwise, 41 
interval, 212 
inverse 

function theorem, 262 
image, 56 
in logic, 316 
of functions, 55 
irrationality, 95 
of n/ 2, 91, 120 
isolated point, 215 

jump discontinuity, 233 

l\ l 2 , Z°°, L 1 , L 2 , L°° 

see also: supremum as 
norm 


L’HopitaPs rule, 10, 264 
label, 60 
laws of algebra 
for integers, 78 
for rationals, 84 
for reals, 106 

laws of exponentiation, 89, 90, 
122, 125 

least upper bound, 117 
least upper bound 
property, 117, 137 
see also: supremum 
Leibniz rule, 255 
lemma, 25 

length of interval, 269 
limit 

at infinity, 250 
formal (LIM), 103, 130 
laws, 131, 223 
left and right, 231 
limiting values of functions, 
4, 220 

of sequences, 129 
uniqueness of, 128, 223 
limit inferior, see limit superior 
limit point 

of sequences, 139 
of sets, 215 
limit superior, 141 
linearity 

of finite series, 161 
of infinite series, 168 
of integration, 274, 280 
of limits, 131 
Lipschitz constant, 260 
Lipschitz continuous, 260 
logical connective, 309 
lower bound: see upper bound 

majorize, 276 
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maximum, 203, 258 
local, 258 

of functions, 219, 236 
principle, 236 
mean value theorem, 259 
meta-proof, 122 
metric 

see also: distance 
minimum, 203, 258 
local, 258 

of a set of natural numbers, 
183 

of functions, 219, 236 
minorize: see majorize 
monotone (increasing or 
decreasing) 
function, 241, 294 
sequence, 138 
morphism: see function 
multiplication 

of cardinals, 71 
of functions, 220 
of integers, 76 
of natural numbers, 29 
of rationals, 82 
of reals, 105 

Natural numbers N 
are infinite, 70 
axioms: see Peano axioms 
identification with integers, 
77 

in set theory: see Axiom of 
infinity 

informal definition, 15 
uniqueness of, 67 
negation 

in logic, 310 
of extended reals, 134 
of integers, 77 


of rationals, 82 
of reals, 106 

negative: see negation, positive 
Newton’s approximation, 253 
non-constructive, 200 

objects, 34 

primitive, 47 

one-to-one correspondence: see 
bijection 

one-to-one function, 54 

onto, 54 

open 

interval, 212 
or: see disjunction 
order ideal, 207 
ordered n-tuple, 62 
ordered pair, 62 

construction of, 65 
ordering 

lexicographical, 208 
of cardinals, 198 
of orderings, 209 
of partitions, 270 
of sets, 203 

of the extended reals, 134 
of the integers, 80 
of the natural numbers, 27 
of the rationals, 85 
of the reals, 112 
oscillatory discontinuity, 234 

pair set, 36 

partial function, 61 

partial sum, 165 

partially ordered set, 40, 202 

partition, 269 

Peano axioms, 16-19, 21 

perfect matching: see bijection 

piecewise 
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constant, 272 

constant Riemann-Stieltjes 
integral, 293 
continuous, 288 
pigeonhole principle, 73 
polynomial, 231 
positive 

integer, 77 
natural number, 27 
rational, 85 
real, 112 
power set, 58 

principle of infinite descent, 93 
principle of mathematical 
induction, 19 
backwards induction, 29 
strong induction, 28, 204 
transfinite, 207 
product rule, see Leibniz rule 
proof 

abstract examples, 
317-320, 327-328 
by contradiction, 307, 316 
proper subset, 39 
property, 309 
proposition, 25 
propositional logic, 320 

quantifier, 322 

existential (for some), 323 
negation of, 325 
nested, 324 
universal (for all), 322 
Quotient rule, 255 
Quotient: see division 

range, 49 
ratio test, 180 
rational numbers Q 
definition, 82 


identification with reals, 

106 

interspersing with 
rationals, 91 
interspersing with reals, 

115 

real numbers R 
definition, 103 
rearrangement 

of absolutely convergent 
series, 175 

of divergent series, 176, 193 
of finite series, 160 
of non-negative series, 174 
reciprocal 

of rationals, 84 
of reals, 109 

recursive definitions, 23, 67 
reductio ad absurdum: see 

proof by contradiction 
removable discontinuity: see 
removable singularity 
removable singularity, 225, 233 
restriction of functions, 218 
Riemann hypothesis, 173 
Riemann integrability, 277 

closure properties, 280-285 
failure of, 291 
of bounded continuous 
functions, 287 
of continuous functions on 
compacta, 287 
of monotone functions, 289 
of piecewise continuous 
bounded functions, 288 
of uniformly continuous 
functions, 285 
Riemann integral, 277 
upper and lower, 277 
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lower), 280 

Riemann zeta function, 173 
Riemann-Stieltjes integral, 294 
ring, 78 

commutative, 78 
Rolle’s theorem, 259 
root, 122 
test, 178 

Russell’s paradox, 46 

scalar multiplication 
of functions, 220 
Schroder-Bernstein theorem, 

198 

sequence, 96 
finite, 64 
series 

finite, 155, 157 
formal infinite, 164 
laws, 168, 192 
on arbitrary sets, 192 
on countable sets, 188 
vs. sum, 156 
set 

axioms: see axioms of set 
theory 

informal definition, 34 
signurn function, 225 
singleton set, 36 
singularity, 234 
square root, 50 
Squeeze test 

for sequences, 145 
statement, 306 
strict upper bound, 204 
subsequence, 149 
subset, 39 

substitution: see rearrangement 


subtraction 

formal ( ), 76 

of functions, 220 
of integers, 80 
sum rule, 255 
supremum (and infimum) 

of a set of extended reals, 
136, 137 

of a set of reals, 119, 121 
of sequences of reals, 137 
surjection: see onto 

tangent: see trigonometric 
function 

telescoping series, 169 
ten, 332 
theorem, 25 

totally ordered set, 40, 203 
transformation: see function 
triangle inequality 

for finite series, 157, 161 
in R, 87 

trichotomy of order 
for integers, 81 
for natural numbers, 27 
for rationals, 85 
for reals, 112 
of extended reals, 135 
two-to-one function, 54 

uncountability, 181 
of the reals, 196 
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uniform continuity, 244 
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upper bound 
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of a set of reals, 116 
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variable, 320 
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well ordering principle 
for arbitrary sets, 210 
for natural numbers, 183 
well-defined, 306 


well-ordered sets, 204 

Zermelo-Fraenkel(-Choice) 
axioms, 61 

see also: axioms of set 
theory 

zero test 

for sequences, 145 
for series, 166 
Zorn’s lemma, 206 
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